Upgrading Second Life, and Why It Is So Hard To Do So

As anyone who has deployed very complex software knows, you can test as much as you want on “controlled laboratory environments” — even stress-test it to incredible levels — but there is no better test than “the real thing”. It’s the nature of so-called “complex environments”. A computer programme running on a single computer on a single CPU with no other programme running at the same time can be analysed to extreme detail, and you can scientifically predict what its output will be given a certain input. Do the same test under a multi-tasking system, and that prediction will be right most of the time — but sometimes it won’t. Increase the complexity so that not only the system is multi-tasking, but it’s a networked environment, and it will be even more hard to predict the outcome. Now go towards the full scale of complexity: multi-tasking, multi-CPU, multiple nodes on the network, multiple virtual servers per physical server — and the “predictions” become chaotic.

In essence, what happens is it’s easy to predict the weather inside your own living room (just turn up the heat, and the room’s temperature will rise uniformly) but not on the whole of the Earth — a chaotic system which is not possible to analyse using a statistical method, and you don’t know the variables to develop a chaotic model that replicates Earth’s weather. Now Second Life is, indeed, a chaotic system, with a limited predictability base. Like a weather system, you can simulate it. You can recreate a system that does not have 5000 sims but just 50, and not 15,000 simultaneous users but at most 150, and see how it behaves in a controlled environment. LL even uses very old computers on the testing grids to make sure you can pinpoint algorithm errors (ie. doing things far less efficiently than you could do in a fast system) — it’s an old trick of the trade. However, a system 100 times bigger than the lab environment is not 100 times more complex — the relationships are exponential, not linear. At a time there were just 20,000 accounts in the database, one could probably extrapolate the tests made on a smaller system by applying heuristics — “if the real grid is 100 times bigger, this will be 100 times slower”. With 2 million accounts, things sadly don’t work so well — a grid a 100 times as big as the testing environment might be a millions even a billion times more complex. Exponentials kill almost all systems, and turn semi-predictable ones into chaotic ones. That’s the first issue: SL is complex. Much more complex than people tend to think.

The second issue is simply a matter of “time”. To reboot the whole grid needs 5 hours — you can’t “cut” time on that. It’s like a constant of the universe — and kudos to LL to make that “constant”, well, constant, during all these years that the grid grew exponentially. So, to suggest that every day the grid should get updated during one hour is, sadly, technically not possible, and that “demand” comes mostly from people familiar with other MMORPGs, where there is usually a “maintenance hour” every day to reboot all servers. Most MMORPGs have very simple back-end systems and the complexity is on the viewer. Most of them also run on Windows servers; stability on Windows platforms can be accomplished surprisingly by rebooting the servers often (ie. once per day) since it is a good way to deal with all the memory leaks inherent to that OS.

| ← Previous | | | Next → |
%d bloggers like this: