Setting the pace of development to “very fast”

Spend one moment away from your computer, and Linden Lab comes up with some new feature!

What’s up with Linden Lab?

For years we have been complaining that they don’t fix things quickly enough; or, when they do, we complain that they aren’t adding anything exciting to Second Life. When they actually come up with something new, we complain they’re not “working with the community”, or ignoring the residents, or, worse, reinventing the wheel when TPV developers have already done something for ages.

And, of course, we can always complain about tier.

Well, except for tier — and there is even some interesting news about that — it seems we have a completely new management style at the ‘Lab.

The strange thing is that it seems that they have less developers. Or, if they have the same amount, they’re all relatively “new” — I mean, is Andrew still around? — in the sense that they came on board only with M Linden, which was not so long ago.

And they have been busy. They developed four new games in about a year or so — but, in the meantime, as everybody was expecting LL’s efforts to stall and stop on SL, and focus entirely on the new games… rather the opposite is going on. There is Project Shining on the works, but it’s so vastly encompassing that it’s hard to say where it starts and stops. From ‘bots to meshes to materials to server-side baking and AOs and reducing lag and a new chat interface… gosh, where to start?

I know, I’ve not been able to blog much; there are good, solid reasons for that, but they have absolutely nothing to do with “reduced” interest in Second Life — it’s just that pretty much everything else is taking sooooo much time away from what little time I had for SL. I fondly remember my 16-hours-for-SL-per-day days back in 2004 🙂 (yes, and that included blogging time)

So, these days, I start writing an article per month or so. I sort of try to cover whatever exciting new thing catches my attention, mostly by reading Inara Pey‘s blog and Hamlet’s New World Notes. Both are quite up to date in covering whatever is happening out there; Hamlet, as a professional journalist, tends to be too sensationalistic sometimes, or just plain pessimistic (bad news sell!), while Inara, ever the optimist, is much more balanced, often giving both sides of the issue. Inara also spends a lot of time in-world at the Office Hours, both for LL and for some of the TPV developers, so she mostly gets the news first-hand. And there are a lot of new things to report. So many, in fact, that when I finally get to type a paragraph or two, it’s already obsolete.

A typical example: material processing. This will mean way more realistic texturing, while saving some polygons on meshes, too. The implications of getting the SL renderer coming close to the competition — pretty much every renderer in the past decade supports materials — were interesting to discuss, while we had no idea when LL would release the code for testing. It’s out now. What’s the point of philosophically discussing what it does and what it doesn’t, if all you need is a viewer and log in to Aditi to try it out for yourself? There went another article into the bin.

LL has been playing a lot with the way the viewer fetches information from the nearby locations. For a decade, we have been plagued with “random” fetching, where distant objects rezz first and nearer ones, which will cover up (“occlude”) the distant ones anyway, will come in last. Or not. It was completely random. I actually did some research work trying to figure out what exactly is happening (OpenSim has similar behaviour, so you can look at the code), and the problem seems mainly to be that there is no way to know which content will arrive first, due to the nature of LL’s communication protocols. There has always been something LL calls “an interest list”: objects in range which are close enough to be seen. But the problem is that “in range” can mean anything from a wall in front of your nose (covering up everything behind) as well as a 1-m plywood cube 128 m away from the camera. LL did lots of tweaking on their algorithm, and the results improved over time, but still the “randomness” was around.

Well, Project Shining changed all that. As LL is moving to use HTTP for pretty much everything (except positioning data), when it came to the point of fetching content, they have redesigned the algorithm. At last, objects nearer to you will rezz first, and if there is something behind them, they won’t be fetched, conserving precious FPS. How exactly they manage to do that is beyond me (I haven’t looked at the code myself), but it looks promising.

Here is what became more interesting… when doing all these dramatic changes — server-side avatar baking, new content rendering, new chat interface — Linden Lab, surprisingly, isn’t doing all the work alone. They’re keeping in touch with the major TPV developers! Perhaps for the first time in SL’s history, materials processing, for example, has been co-developed with TPV developers (the original code came mostly from the Exodus viewer, but also Catznip and Firestorm). They’re also fine in crediting them — and giving them access to early releases of the code, so that they can do their own testing with their own viewers. The new CHUI (Chat User Interface), for instance, replicates some features already available on other viewers, and adds a few more, pushing for a re-programming of a lot of code; so LL is working closely in tandem with all those teams.

Now that’s quite a different attitude from the past, where there seemed to be an ongoing war between LL and TPV developers! During the M Linden days, it was supposed that TPV developers would be “co-opted” for LL’s own projects. This obviously didn’t work; TPV developers, like pretty much the majority of residents, tend to mistrust LL 🙂

So this is the “new” relationship that LL has with its most faithful followers and fellow co-developers: they work together. LL is finally openly delegating some work on TPV developers and not just “pushing” out their own ideas without asking anyone. Rather the contrary: they seem to be genuinely interested in knowing what TPV developers are doing, how they’re doing it, and exchange ideas, plans, projects, deployment times, and so forth. I’m sure it’s not that rosy, of course, and there are plenty of hiccups in the process; and it’s too early to see the LL viewer adopting things clearly developed by third parties, like RLV or inventory/object export/import.

On the other hand, most viewers had implemented some sort of viewer-based Animation Overrider. LL, after 9 years of ignoring AOs, came up with a novel idea: server-side animation overriding. This is actually something which ought to have been done eons ago: allowing avatars not to play the standard animations and fetch different ones instead. If you have never scripted your own AO, you might now be baffled — isn’t that how AOs are supposed to work?

Actually, no. What AOs do is to detect the “animation state” of the avatar; the standard animation gets loaded in any case, no matter what. But you can stop them, and then launch a different animation. Since that happens relatively quickly, you have the illusion that the original, standard, “silly duck” walking animation has been “replaced” by your AO’s gorgeous sexy moves, but that’s not the case: the original animation is still launched, gets stopped, and the AO’s own is played instead.

This wreaks havoc with the animation priority system (each animation has a priority from 1 to 4; higher numbers will override lower ones; Linden anims usually have priorities from 1-3), and that’s why most AOs just use priority four anims, something which I hate, because it means that your other gestures, bought elsewhere, will not work. And LL never introduced a way to change an animations’ priority: it’s embedded in the asset when the original creator uploads it.

The new code will get rid of all that, finally allowing AOs and poseballs to work properly, as the animations can truly be overridden and not merely stopped and “played over”. This will not work for everything, since some animations are client-side, but will go a long way to improve future AOs and pose balls; of course, the old system will remain in place.

This seems to be one of many things that are being pushed onto the server, like avatar baking, another pet peeve of residents for ages. One might reasonably ask, with so many things pushed onto the server, will lag reign supreme — server-side?

Actually, no. The point here is that the simulator servers are getting lighter and lighter. As LL pushes more communications into HTTP, these things can be cached on the cloud by Amazon’s services; which means that LL can add as many cloud instances as possible to handle the extra load, while the simulators themselves are doing less and less.

This requires a bit of an explanation. If you read Philip’s and Cory’s seminal white paper on Second Life, their idea was to deploy a cloud infrastructure to handle the grid’s communication with the viewers. The word “cloud” wasn’t popular back then, but “grid computing” was. Their original idea was that viewers would be like 3D browsers, and simulators, running the grid, would be like web servers. Lots of webservers, sharing the load. So, the original “central asset servers” would never be contacted directly (unlike what happens with other virtual worlds and MMORPGs): the viewer would contact the simulator, the simulator would fetch the asset from the central servers, and get it to the viewer, while keeping a cached copy of it for future use. This was pretty much how SL worked until 2008 or 2009.

This didn’t scale well, as we know: simulators would be too busy serving assets — content, textures, handling chat, and so forth — and have little time left for, well, simulating: meaning keeping where avatars are and what they’re doing, as well as running scripts and so forth. These essential bits, due to the vast amount of content to be constantly transmitted, had little CPU time left to run, and so we experienced almost constant server-side lag.

As LL moved to HTTP-based communications, this meant that the content could — finally! — be cached on heavy-duty Web proxy servers. This is what Amazon S3 is essentially providing: they cache all content from the asset servers and serve them directly to the viewer. The simulator doesn’t need to worry about that any longer (but it’s still backwards compatible with the old protocol) and, as such, is much lighter.

The more LL pushes towards HTTP communications, the more enthusiastic they get. Inventory, these days, gets pulled via Web protocols — and can get cached locally so much easier. Similarly, instead of merely caching textures, sounds and animations, the viewer caches whole objects. So much reliance upon HTTP, however, hit a barrier: old home routers disallow opening too many Web requests. So, LL is slowly changing the whole underlying protocol, using persistent HTTP connections instead. This is standard practice on games and applications; we’ll have to see what the impact will be for Second Life. The point is, though, that LL is very willing to do everything from scratch, if it means improving the experience somehow.

Of course, there are still areas for improvement. Meshes still need a good, working deformer. Still, the current status on the impact of meshes on the environment seem to be overwhelmingly positive: my friends who are in touch with the top fashion designers report that everybody who knows how to do rigged meshes is making cartloads of money, as everybody is dumping their “old” clothes and avatars and buying meshes like crazy. The revival of the economy is much appreciated — on some (rare) days, there is even some positive growth of the landmass, probably from new generation fashion designers opening their shops — and more kinky locations for showing off meshed adult content, which is becoming, uh, more interactive and realistic. No surprise there: adult content has always driven sales and include many early adopters. Even if the bulk of the SL economy is not adult content, it certainly trails its development and innovation.

What apparently will not change is tier. I won’t repeat here the arguments — simply put, LL’s business model doesn’t work with lower tiers. The prices are not elastic. Nevertheless, it was quite encouraging to see that LL quietly brought the educational/non-profit tier discount back. It’s not open to everybody and comes too late to stop the exodus of most educational and academical projects to OpenSim (specially Kitely, which is dirt cheap for the incredible performance it has), but, in spite of everything, it shows that LL is reverting back to their more sensible days. And the academic community is really not to be shunned. A few years after the first researchers showed off their projects in SL, they would still raise eyebrows at conferences, and had a hard time to get their work published. Today — even though pretty much all research moved to OpenSimulator — there are art least five scientific journals just for Second Life/OpenSimulator. Oh, sure, they accept submissions from projects done in other technologies, but the vast majority of immersive virtual world research goes on SL/OpenSim (and these days, almost exclusively on OpenSim). This means that there is a constant influx of new virtual world users who get acquainted with OpenSim during their studies — many of which will continue their career working on that (or similarly related areas). The reason? SL and OpenSim are long-lasting technologies. When I started to study computer engineering in the late 1980s, one of my teachers told me that a software solution to reach maturity would need 7-10 years; that’s why Unix and C/C++ were taught, they were pretty mature by then. Microsoft attempted to “shortcut” this “maturity stage” by massively investing in the development Windows to “hurry up”, but that didn’t work; Windows only became a very mature technology in the past decade. So, ironically, LL was quite helpful with academics while SL wasn’t a very mature solution, and dumped them once SL, as a testbed for academic research in immersive virtual worlds, became the standard. Reverting that position now is a good idea, but probably too late: the only kinds of researchers “needing” SL (as opposed to OpenSim) are the ones requiring vast numbers of residents (like the sociological, psychological, and anthropological studies) and/or vast amounts of free content. All the rest are better served by OpenSim.

But as SL enters its second decade of existence, it looks like the good news far surpass the bad ones. On one hand, sure, we still have lag, and it won’t go away. Region crossing still has hiccups, and mega-regions are not on LL’s to-do list, neither is a more dynamic approach for CPU allocation to simulators with a lot of traffic, critical if LL wants to expand their grid and allow way more avatars in the same region (OpenSim can display a thousand; SL peters out at 100, and that’s stretching the point). Buying items from the SL Marketplace is much easier these days with direct delivery, but, really, we need a much better inventory system. Talking of which, the whole viewer needs deep rethinking. When LL made the source code available, the first thing users expected was a new interface, coming from the third-party developers. But after six years, all we get is a “variation on a theme”: TPVs, except for Radegast and the mobile/text-based viewers, pretty much use LL’s user interface (either the 1.X variant or the 2.X/3.X one), and just shuffle things around, even though Nirans’ Viewer has done a lot of reshuffling and almost looks like something different altogether. A Web-based browser (that works on tablets, smartphones, or consoles) hasn’t been delivered yet, although Pixieviewer might be one project delivering that. Competition continues to sporadically appear: Cloud Party survived over six months, beating Google’s Lively and my own predictions, and they’re slowly rolling out things like their new Block Tools which will allow collaborative building to become easier. Philip Rosedale is launching his newest start-up, which will develop… a virtual world, using a voxel renderer. Will that replace SL? Merge with it? Become a new competitor? (Is Philip allowed to compete with himself?) And while OpenSimulator-based grids might not be growing like crazy, and aren’t a threat to LL, nevertheless, they’re here to stay. Kitely, for example, has an unbeatable business model totally unlike any other, which can deal with growth and contraction without fear and ultimately remain alive forever with minuscule running costs.

But on the other hand, LL’s is developing like crazy. And I mean it. Four new games, not sharing code with SL, developed in a year — all that while LL launched Project Shining, new features every week, implemented mesh (several times; there are differences between each “generation”) and continues to develop it actively, better ‘bots with built-in pathfinding, changing most of the infrastructure to use HTTP-based communications that can be served from Amazon S3, server-side baking, server-side animation overriding, a new chat interface, materials processing, direct delivery… that’s an extraordinary amount of work, most of it started just when Rod Humble became the new CEO. This pretty much echoes the pace of development done by LL during 2004/5, but remember, SL was a tiny virtual world by then, and it was so much easier to change things. Residents were also far more tolerant — these days, who would accept the whole grid being down for a whole week while LL fixed a problem? Even “maintenance wednesdays” — almost every week the grid was down for half a day while LL deployed new versions of the simulator software — would be utterly unacceptable today. Now LL tests things extensively with users and TPVs (on Aditi) after doing it internally, then releases code partially on the Agni grid, until finally it gets rolled out on all simulators. Even though this means a rather long testing time, that hasn’t stopped LL from developing a lot of things in parallel at the same time — I’ve lost track of how many projects are currently under way, and every week there is something new.

Not even LL seems to be up to date on everything; their own “official” blog for Tools & Technology is seriously outdated and only mentions a few of the many changes that LL is working on. Bad communication? No, I guess that even the technical writers at LL are overwhelmed with the pace of development.

To conclude… Second Life is not merely the technology, but mostly the people using it for everyday use, be it leisure & entertainment, hard work to earn some well-deserved L$, or academical pursuits. Still, this vast user community doesn’t exist in a vacuum: they need the technology to pursue their goals. Many of the improvements that LL is working on will push SL’s look to become more and more similar to what current-generation games have been implementing in the past decade — which will mostly allow content creators to deploy even more sophisticated content (and make more money out of those!). Many others will improve the overall immersive experience: no, lag will not disappear, but perhaps it might just become a nuisance (and not a crippling annoyance) in this second decade of SL. In the meantime, minor changes here and there just show that LL has realized that there is always room for improvement in SL, and that it pays off to keep their cash cow happily being milked, so that they can extend their offerings in other areas, like games, the development of which is totally financed by LL’s considerable profits with SL (and which, in turn, may add further sources of revenue, allowing LL to improve SL further, and so forth).

While a “mainstream SL” with a billion users is hardly on the event horizon, I think that LL is doing their job well to keep a “core” membership of a million faithful users in a niche market which they explore well, and which can remain around for many, many years. This is a strict departure from the “darker days”, full of hype and promises, and jumping into what apparently was the “wrong” market. Personally, of course, I think that the major issue they will have to face is how to deal with tier as the basis of their business model. It’s not to be addressed lightly. In the past, I suggested that using different metrics to charge residents might be a way to deal with the problem; for instance, why should a region always empty of avatars (and thus sits still most of the time, not consuming bandwidth) cost as much as one that constantly has 40+ avatars engaged in many activities which consume precious resources? Unfortunately, LL’s grid is not designed to dynamically assign resources where they are most needed. So, I don’t expect tier to change unless — or until! — LL changes the way their grid works. Examples like OpenSim’s Distributed Scene Graph (DSG) or Kitely’s cloud-based, region-on-demand simulators show that one can build a vast grid-based virtual world using dynamic allocation of resources, and, as a result — as Kitely has successfully implemented in practice — different business models are possible. This, however, requires a lot of under-the-hood changes. Will LL be able to implement them in the upcoming decade? Possibly, but, in the meantime, they prefer the alternative approach of pushing as much bandwidth they can on proxy/cached services running from Amazon S3 to ease the load on existing simulators, and care little about what those simulators’ CPUs are currently doing. This means, for now at least, significantly better performance without the need to address the core issue — but it also means that LL is stuck with their business model.

Well, on that subject, I have already written several times in the past, but, since the point seems not to have caught anyone’s attention (much less LL’s developers!), I’m addressing it again. LL has, what, 3000+ servers? Cool. Install OpenStack on them. Simulators already run in virtual instances on individual servers; things like OpenStack will just be able to shuffle sims around, depending on the amount of CPU and bandwidth they need. If a simulator is suddenly teeming with avatars who have just dropped by, that instance can be pushed to a server with plenty of resources which isn’t running any other region — while “dormant” sims, with little traffic, are quietly shuffled to servers running a hundred or a thousand instances. The whole grid, after all, has focus points — places where a lot is always going on — but most of it, perhaps well over 95%, shows little activity during most periods of the day. This means that an in-house cloud-based solution can handle the asymmetry of the load very efficiently: 5% of the sims (about 1500) might be running on 50% of the server farm, with 16 CPUs per simulator (!) and full network card bandwidth for all that. While the remaining 50% keep the 95% “dormant” sims in stasis, to be “awakened” instantly if the need arises. That’s the beauty of cloud-based computing. With a little more effort, of course, LL can dump all hardware and just do it directly on Amazon’s own infrastructure at a fraction of the cost, but I imagine that their long-standing investment in hardware is something they’re not so willing to get rid of…

Let it happen quietly, without residents noticing the change, and start doing statistics. After a while, LL will know exactly how much resources will be needed and can totally change their business model. It might resemble what Amazon charges — or Kitely: a form of pay-per-use. The big issue here is to figure out what model works best and still allows LL to derive the same income.

This is tricky, since, in the above scenario (notice that the percentages are all invented, I haven’t done the maths), it would mean that 5% of the regions would need to support 50% of LL’s income from tier, i.e. something like US$5 million/month. That means charging those 1500 sims something like US$1666/month. Clearly those people who have put a lot of effort in attracting so much traffic would not be willing to pay five times as much as they pay today for the same service — even if they get way faster performance that way. On the other hand, the remaining 28,500 sims could just be charged US$87/month — for a full sim, with 15,000 units of Land Use, and the possibility of 100 simultaneous avatars. They wouldn’t get “crippled” sims that way.

Obviously, this business model would require a lot of adjustment. Ideally, those top 1500 sims wouldn’t wish to pay more than the $295/month they get charged today, while all the rest would expect to pay far less — say, US$75/month. This would only be possible if the size of the grid doubled (if my maths aren’t wrong). While I can certainly imagine that a lot of people with low-traffic regions would more than gladly replace, say, one $295/mo. region for two $75/mo. ones (they would get twice for half the price!), it’s unlikely that all sim owners would do that. More likely, the vast majority would remain with the regions they own now, and just enjoy a discount of 75% and be very happy. This is the main hurdle that LL would have to overcome, since the alternative is charging the top busiest regions way, way more — which would be a huge disincentive for them to stay, and, more than that, to attract so many visitors. It’s also worth mentioning that Kitely, for an always-on region, just charges $40/monthly (far less if you buy, say, 16 regions, which will only cost $100/month total!); while on-demand regions are… free, at the lowest end of the price list. These prices are unbeatable!

Still, I think that even if LL keeps the current price structure in place, it’s worth considering a move from the current grid to a cloud-based one, using exactly the same hardware and bandwidth: at the very least, they will provide far better service to the highest-traffic regions (encouraging them to stay). They might even increase things like the land use allocation and the number of simultaneous avatars, because the cloud hypervisor would be able to shuffle high-traffic regions dynamically to better hardware and allocate the required resources of the whole server in exclusivity. These would be good selling arguments. And they can certainly do their development on Aditi first, which runs on older hardware and is far less stable than the “main” grid, thus allowing testing under poor conditions.

The future will tell, but it looks rather bright so far! 🙂

[Slightly edited and replaced the link to Rosedale & Ondrejka’s seminal white paper from 2003, since due to bit rot, it dropped out of the now-defunct Gamasutra site; I’m adding a PDF of the latest snapshot I could find on the Wayback Machine— Gwyn (June 2022)]