Interconnection and managing load
Linden Lab launched the Open Grid Public Beta with much advertising on my 4th rezday (talk about coincidences…), which was for many (including yours truly) the beginnings of a dream coming true. The whole OpenSim development could become quickly pointless if LL didn’t give any signs of allowing integration between OpenSim-based grids and LL’s own Second Life Grid®. It’s obvious that our friends from IBM have been steadily pushing for this kind of thing — allegedly, they run a mix of OpenSim and LL’s own servers behind a firewall and wish to fully integrate all of them.
The Open Grid Protocol is LL’s answer to a possible integration between OpenSim grids. At this stage, it’s beyond me to fully explain and understand how it works “under the hood“, you’ll have to rely on hard core programmers like Tao Takashi to fully grasp the technology behind it. The Public Beta, to an extent, has, however, been a slight disappointment for me, and let me briefly explain why.
First it’s important to understand that this is a closed beta, not an open beta. You can’t apply for it any more! Allegedly, one month or so of testing with 100 residents was “enough” to prove that it works at all. But the way it works… is quite limited.
First, Linden Lab has to manually add your avatar to the list of “gridnauts”, avatars specially enabled to be able to jump across OGP-compatible grids. This doesn’t take much time (a few hours), but LL is not able to do that for every avatar. So just a tiny group is allowed to use this amazing technology.
Also, LL hasn’t integrated their SL “main” grid with OGP, but just the Preview Grid. What this means is that you’re not really “teleporting from Second Life to OpenSim-based grids”, but just from one grid run by LL. Still, we know that all content on the Main Grid is available on the Preview Grid, and this first stage of OGP integration was only about authenticating avatars, so that’s enough — your avatar login will work as well on the Preview Grid as on the Main Grid, so, for all purposes it’s a valid test.
You also need a special version of the SL client (dubbed “Open Grid“) to manage the intergrid teleports. While LL definitely released several patches over few months to make that new viewer closely compatible with the standard viewer, it means yet another viewer around your applications folder. The differences are minimal, of course.
Intergrid teleporting is not “painless” but requires a few manual steps. Nothing very troublesome for a SL veteran, but still a bit complex for regular users. But the real disappointment comes when you suddenly realise that you’re just being provided remote authentication and nothing more. What this means is that your avatar on the remote OpenSim grid will have no inventory (and no way to add anything to an always-empty inventory) and be unable to change appearance. You’re just a “ghost” travelling across a foreign landscape. You can talk to other residents, of course, and interact with anything you see. You just can’t exchange inventory.
Granted, things have to be done a step at the time, but it is a bit frustrating to have waited a year and a half or so just to be a cloudie on a “foreign” grid… and no clues on when LL will continue with their work. Allegedly, IMs is the next step (as said before, LL and OpenSim developers are obsessive with IMs for some reason, when all they needed to focus on is on Jabber/XMPP and get that obsession out of their systems…). Inventory transfers? Well, perhaps in 2020, unless something changes (see below).
So that’s the state-of-the-art on LL’s side for now. If they had an automatic way of enabling “gridnauts”, this would at least be minimally interesting, since you could avoid all the trouble of registering to the myriad OpenSim-based grids out there, and just use your LL account as a single sign-on solution for all OpenSim-based grids and LL’s own. That would at least be worth the trouble!
Sadly, LL has no intention of automating the “gridnaut upgrade” in the near future. This just remains a proof-of-concept which was field-tested by a handful of residents and nothing more — for now.
Hypergrid — intergrid teleporting done right
Just six weeks after the Open Grid Public Beta was closed, the OpenSim developers, frustrated with the slow pace of LL’s intergrid development, as well as the unnecessary complexity of the whole Open Grid Protocol, officially launched Hypergrid. According to legend, from concept to implementation it just took three weeks (probably for a handful of developers). And yes, as that page so well explains it, it’s full teleport between any OpenSim-based grid.
It’s devilishly cleverly done. When teleporting to another OpenSim-based grid, your avatar retains the information of what asset/inventory server you use. The “foreign” grid just needs to ask your UGAIM servers about that information (and cache locally) and rez it. That’s all it takes! It’s mind-bogglingly simple, and I can very well believe that it really just took three weeks to code that, since almost everything else was in place — the ability for avatars to “know” the URL of their UGAIM servers is, of course, at the very core of OpenSim’s application classes (contrast that to LL’s solution, where the same information is hardcoded for one grid only, ie. their own — that’s why they can’t have separate UGAIM servers for their Dallas and San Francisco co-location facilities, and had to use VPNs for that, and expend an insane amount of money now on a fibre link between both facilities to get rid of the VPN troubles. All that could have been avoided with just three weeks of proper coding!). The team behind Hypergrid only needed to make a few changes here and there.
Even more fun is the notion of “virtual world hyperlinks”. Basically what you do is to add an “alias to a sim” (similar to a desktop alias, if you wish, or a symlink for you Unix gurus out there) on your own grid. That “sim alias” is actually on the foreign grid, but the Map will show it as if it belongs to “your” grid instead. So your users just need to teleport to that “sim alias” on the map, and without noticing much of a difference, you’ll suddenly be on a different grid. But… with your avatar’s appearance, attachments, and full inventory!
The concept is insanely cool, specially because it doesn’t require any effort. All avatars are automatically Hypergrid-enabled. All current versions of OpenSim support Hypergrid natively. Activating Hypergrid on your grid is just adding a flag and a configuration line and rebooting your OpenSim application (which takes, oh, perhaps a minute or two, depending on how many sims you run on that server and how many prims each sim has). That’s all! The system administrator can then interactively log in to the OpenSim console and start adding “sim aliases”, which the OpenSim team calls “virtual world hyperlinks”.
It’s so easy that even a child can do it. Oh, sure, there are glitches — only to be expected for something that was developed in three weeks! One serious glitch is that the SL client (this has nothing to do with either OpenSim or the Second Life protocol) has some problems teleporting to sims that are more than 4096 sims away. On LL’s main grid, this never happens, since no two sims are further apart than that. On OpenSim-based grids, you can use the whole gridspace, and people started to use it, so this unknown bug popped up (LL might fix it in a year or two, they’re aware of it, and it’s reported on the JIRA). Right now, what this means is that if you wish your own OpenSim-based grid to be both OGP-enabled and Hypergrid-enabled, you’ll have to spread a few empty sims in 4096-sim-intervals, and ask your users to jump between those, a step at a time. But that’s the only limitation!
Even permissions are properly handled. So, yes, you can bring your own content and rez it on the “foreign grids”, and it will still list you as the creator. Of course, remember always that the OpenSim manager will have access to their own databases and be able to copy your content, but, alas, there is nothing we can do about that. You’ll just be careful only to teleport to OpenSim grids you trust.
In any case, this is so good and so well done that I almost had a heart attack the first time I’ve tried it out 🙂 Sadly, the only OpenSim-based grid I managed to teleport to was an Italian one, and my lack of understanding of Italian didn’t allow me to talk to the residents there — and there weren’t many around anyway.
OpenSim’s Hypergrid gave me in 2008 a glimpse of what LL’s OGP will be in, oh, perhaps 2020 or 2030 or so.
Load balancing and dealing with performance
For ages, Morgaine Dinova has been complaining about the lack of proper use of resources in LL’s own grid. She argues very thoroughly that SL is wasting a huge amount of resources to the point that the tecnology is not scalable. The argument is actually quite easy to follow. The top number of simultaneous users in SL, by the end of 2008, is 70,000 or so, for around 30,000 sims. We know that except for OpenSpace sims, LL runs one sim per CPU, so this mans something like 3 avatars per CPU. But the reality is, of course, much worse. Thousands and thousands of regions are usually empty, while a few hundreds have all those residents spread unevenly among them. So very likely dozens of thousands of CPUs are always unused at all, while a few hundreds are frying their motherboards trying desperately to take all the load of the thousands of avatars on a handful of sims.
This happens because SL’s grid uses a tiled model for the map, where each sim has a fixed amount of land (256 x 256 m) to manage. You cannot predict in advance where the avatars are going to be, so most of the CPUs will be wasted as they just run the software for an empty sim. Although disk space might pretty much be “free” (such low-cost as to be irrelevant), CPU and bandwidth are not — and all those overpopulated sims are taking a serious hit, while the rest of the grid is totally idle.
Well, Morgaine concludes that in terms of architecture this is completely mindless, and from a business perspective, you’re wasting your money by running empty sims, and never providing enough CPU + bandwidth where it really matters — on the popular sims.
LL vaguely claimed that their idea of 3D content hosting was to provide “isolated” sims (with a fixed amount of resources, e.g. 65,536 m2 of land and 15,000 prims) that would be independently leased, “similar to web hosting”. In practice, however, web hosting doesn’t really work like that. On a shared host, what you actually rent is disk space and traffic. The host will load more or less instances of the web server depending on how much traffic you’re actually getting on your websites. This happens dynamically. Hosting providers give you next-to-infinite disk space to use, but charge you by traffic. The best example is, of course, Amazon’s services, which basically cost “nothing” if you just leave all the files on their server, but they start charging you for using CPU and bandwidth — the more you use, the more you pay. Thus, you can adequately “share” a single server with several users, and only give them the resources they need and bill you for them.
In Second Life, however, you pay the same for an empty sim (ie. no avatars, no prims rezzed, no scripts running) as for a fully-loaded ones (15,000 prims built and 100 avatars rezzed 24h/day).
Now, OpenSim, to be SL-compatible, uses the same model. In fact, it would be the only way to make it easily compatible with the SL viewer. Things like the size of the sim are hardcoded on the viewer. The best you can do — and this is quite easy under OpenSim — is to move your region sims around between servers so that you have fewer heavy-traffic sims on a single server, and leave the “empty”, mostly unused sims to be launched elsewhere as a group (not unlike what LL attempted to do with their openspaces product — and we all know how that didn’t work out).
So in Morgaine’s view, OpenSim’s architecture, since it’s so closely modelled to LL’s own, is also not scalable, and the opportunity should have been used to implement a better architecture.
Enter load balancing and region splitting. Developed independently by the Japanese company 3Di, Inc., this is cutting-edge backpatching on top of OpenSim that allows a far higher degree of scalability and handling of “overloaded” sims. Two strategies have been employed: first, a dynamic load balancer, that pretty much does what I’ve suggested above, but automatically. A set of applications monitors the current load of a sim (one of a set of several running on the same server), and if it reaches a certain threshold, pushes it to an “empty” server. The way this is done is high-tech magic — not unlike the magic employed to move a running script from one sim to another: you capture the status the sim is currently on, create a clone of that sim on another server, and unpack the captured status, telling all clients that were connected to the sim to use the new server address instead. Amazingly, this all works quite well — I guess that avatars on the sim will experience a short delay like a teleport and then suddenly watch the lag disappear. Once an event is over, the “empty” sim can then be pushed back to the server again.
What this means is that an OpenSim-based 3D content hosting provider will do is to try to gauge the usage of their grid in advance. We can expect a similar degree of usage as on LL’s own grid. So perhaps out of ten servers, one is overloaded. All you need to do is to figure out how many sims you’ve got, divide by ten, and buy that amount of high-performance servers to give your users a good experience. The remaining 9 servers can be on “low performance servers” — until they’re needed.
Region splitting is even more radical. Under this model, it’s assumed that a single server is so overloaded that it can’t handle the load any more, not even when it’s running just a single sim (and uses, say, 16 CPUs and all bandwidth to serve all those avatars). In that case, the 3Di software splits up the region and distributes copies of it among several servers. Each server just needs to update a part of the region, and communicate the changes to the other servers. Avatars are neatly split across all those servers too, and get updates from the server they’re connected to. What this means is that you can have thousands of avatars on the same region. Or perhaps millions, if you split the region enough times. In practice, however, the communications between the many “split” regions to give the overall picture back to each avatar will overload the network and hit a limit at some point. Nevertheless, it definitely means that 1000-avatar-events are possible without lag on OpenSim. And all this is done automatically and dynamically — this means that you don’t need to setup anything in advance for your residents, the automated software will handle it all for you if someone gives a party and invites a thousand friends, and will neatly clean it up after the event is over.
Contrast just these two amazing improvements with LL’s own. Second Life Grid will simply never manage that in the next decade or so…
So, well, lots of promising technological goodies inside OpenSim (oh, yes, there are more). But how well do these actually work in reality? Let’s find out…