Skip to content

No More Limits!

The Programmers’ Nightmare

Expert 3D modellers or architects used to over 30 years of using computer-aided 3D modelling tools tear their hair in frustration when dealing with Second Life’s in-built tools; but nothing like seeing programmers weep and cry at what Linden Lab provides them as a “programming language”.

Linden Scripting Language version 2 (LSL 1 was done over an afternoon and never used by any resident) is ancient. It has about the processing power of a 100 KHz CPU (that’s not MHz) and uses 16 KBytes of memory. That throws us back to 1981, when “personal computers” were kid’s toys at the dawn of ages. But worse than these basic limits are the “Linden limits” on a lot of function calls, which were deliberately “slowed down” mostly to “prevent griefing”. We’ll talk about those in a minute.

In the mean time, if you’re a programmer and curious about how people are able to do such amazing things with LSL, and eager to enter a brand new age of 3D software development — think again. I usually start my programming classes saying that “a LSL programmer spends 10% at doing the code and 90% at developing walkarounds for LSL’s artificial limitations”. Yes, this means that a thousand lines of code in LSL take ten times the development as in any other common, modern programming language.

Imagine that you’re a Web developer and picked, say, PHP for your developing environment. After a moment you suddenly figure out that every time you call a function, the whole webserver blocks for a few seconds, and nobody can view your site during that period. Baffled, you look at your code: are there any bugs there? A strange loop that is waiting for something to happen that never occurs?

You open up the manual, look up the functions used by your code, and just get a warning: when calling a particular function, it stops your webserver for a bit. There is no plausible explanation. The language designers just felt that this function was particularly “dangerous” and, to prevent you to abuse it, block your webserver for a bit.

You scroll down checking each and every one of your functions. And, as you expected… any of the useful functions (the ones you cannot avoid to call, or you web-based application wouldn’t work!) are — you guessed it! — artificially blocking your webserver. All in the name of “protecting your webserver from abuse”. But… what abuse? You’re in control of your webserver, you’re supposed to know what you’re doing, right?…

Going further on, you start to change your code around, just to make sure you call these “blocking” functions as little as possible. You push them into other scripts and call them remotely only when needed. And add some extra code to look up what you’re doing, just to avoid, at all costs, to call those functions in the wrong time.

What happens is that suddenly you find out you have no memory left!

Now what? Well… you can obviously push some of the functions into another script and thus split the memory limits between two scripts… but then you have to add an extra layer: communication between scripts. In effect, you’ll be deploying a whole set of Remote Procedure Calls — those things that you’ve vaguely learned about in college, but that serious programmers never use inside their own code. It’s back to the drawing board, then. And you suddenly figure out that the language does not support Remote Procedure Calls natively — you have to write your own communication layer to deal with that.

While you’re doing this, new problems pop in. When you’re calling a function on another script, you suddenly have no clue where you’re calling it from. This is a synchronisation issue: there is no guarantee that you’ll return to the point you are. Well!… computer software scientists wrote whole books on synchronisation issues. They came up with nice little ideas like “semaphores” or “shared memory” to deal with those. Guess what?… Your programming language doesn’t support either. You simply don’t have anything which is globally shared across scripts. What now?

Oh, and you can’t even store parameters on external files, much less databases. You can read from files (cumbersomely so, and it takes a lot of time), but not write to them. You can call functions on other webservers — provided you don’t call them often. More than ten per second, and you’re out: your script gets blocked, and you get no clue of what’s happening. 

If the World-Wide Web applications had to deal with this sort of issue every day, we would never had seen it growing like it did (the Common Gateway Interface which allows programming languages to be used to develop Web applications was created around 1993 or so). We would be stuck with static HTML on our pages, and simple and primitive search functions perhaps — so long as you didn’t search a lot, of course. And with luck you’d be able to allow your users to change the colours of your web page. Not too often — or it would crash your webserver.

Well, this is the daily experience of a professional LSL programmer!

LSL programmers can bang their collective heads at the walls, but the silly limitations won’t go away, even after years. In fact, LL is quite devious at inventing new ones every day. For instance, since 2003 people have been asking for efficient inter-object communication. LL gave them a way to send text chat and to receive it at the other end — in a painfully slow way. Also, in the process, it lags the whole sim. So they improved it with “linked messages” — but they only work inside the same object. You cannot use a superfast linked message across objects, just across, well, linked prims. You can send emails to other objects — provided their key (UUID) doesn’t change (which it will, so long as you make a copy of the object) — and it takes 10 seconds to deliver. When it does. You have no way to know. And finally, you can make HTTP requests to external webservers — that’s superfast too (almost as fast as linked messages). So LL promptly refuses you to make more than about ten requests per second. Good enough for primitive interfaces (humans are slow to react); totally useless if you wish to develop, say, a fast-paced game that requires a lot of communication. Ah… and you can’t get in touch with in-world objects either, it’s only one-way. Well, sort of. You still have the (deprecated) XML-RPC calls, launched in June 2004 and never changed since then. These used to work blindingly fast too — in June 2004. These days, LL doesn’t even guarantee that any message sent from a remote server using XML-RPC will ever reach the object. The infrastructure supporting XML-RPC is hopelessly outdated and struggles to survive with all the creative uses that people have given to it (like, for instance, SL Exchange or OnRez…). LL recommends not to use it and promises to give us new functions to do pretty much the same — in a few years. Or decades.

If you think it can’t get worse… it does. There are millions of Animation Overriders in SL. Why? Because when LL introduced custom-made animations (also in June 2004 — an excellent month in fantastic new features!) they forgot a tiny detail: how to change the default animations?

One would obviously assume that there would be a built-in preference dialogue box to quickly change them, by dragging and dropping animations on top of it. Not so!… LL forgot about them. LSL programmers to the rescue: by reading the avatar’s “animation state”, you could force it to stop the current animation and start playing a new one. All very nice (conceptually so) until… programmers started to hit the “limitations” of LSL. You don’t get a nice event to tell you when an animation changes: you have to continuously ask for it. Yes, you guessed correctly — that’s insanely laggy.

So you get all sorts of events from SL — like when you touch an object, collide with it, change its shape, colour, or ownership, or even when you drop something inside of it. There are quite a lot of events, and their purpose is always the same one: every time something changes in SL, your LSL script gets informed. You don’t need to check for a change; SL is happy to call your script when an event is waiting for you.

Except, of course, when it matters. There is no event to inform you when an animation changes. Why not? Well… “because”.

Thus, since September 2004, when the first AO was launched, that the grid is plagued with a few millions of AOs that lag the whole grid while they constantly check if their owners’ have changed an animation by chance… and, well, LL has never changed it. Why not? It’s… the Tao of Linden. Adding a new event, or, better, creating a nice, friendly, easy-to-use dialogue box on Preferences to drag and drop animations on top of it, is, well, not a priority — it is still waiting for some Linden developer to take a look at this JIRA request and do something about it. Or, worse, this one, which doesn’t rely on LSL but is a suggestion on how to change the SL client interface to allow client-side AOs (notice that Alexa Linden deemed it to be the same as the other request — it is not the same! — and simply “closed” it, to be buried and ignored under another thousands of useful requests).

Remember the dance machines? They are neat toys where a lot of avatars can select animations to dance. Guess what, a script can only animate one avatar at a time. So how do these dance machines work? They have one script for each possible avatar, and an insanely complex communication protocol to make sure you get assigned a free slot on one of (possibly) 100 scripts inside it. Well, probably not a hundred — if you place more than 40 or so scripts in a single prim, things start to misbehave. So you’ll have to split your dance scripts among several prims, and communicate across them. Wow, so much trouble for a simple device?… Oh yes. That’s what means programming in LSL: the most simple things take a lot of effort just to work around the limitations. Why didn’t LL add the possibility of animating more than one avatar from a single script?… Good question! It can’t be a lag-related issue (more scripts will lag more than one single script), so very likely it’s just because the Linden developer in charge of that bit of code hates clubs and dancing and doesn’t think it’s worth the effort.

When you start entering the physical world… things get even more dramatic. Physics-enabled devices (from vehicles to weapons) are notoriously laggy and have a huge impact on sim performance. So LL has become extra-devious in limiting them all. The major reason here is to avoid griefing — so these functions all have in-built delays and limitations. That’s why it’s hard to create a super-weapon that fires a lot of bullets per second: LSL doesn’t allow it. But… there are workarounds.

In fact, a clever concept is at the kernel of LSL programming: “script farms”. The limitations are almost all “per script”, e.g. like the example for the dance machine. So what the clever LSL programmers do is just to spread a function across more scripts. You can only fire a bullet every three seconds? No problem, have 300 scripts firing bullets at the same time, and cycle among them, and you’ll be able to have a constant firing rate of 100 bullets/second — pretty impressive when you’re at the battleground. And pretty impressive on what it does to the lag on the server, too.

Likewise, LL has fought very aggressively how griefers are allowed to do replicating objects. There is the “grey goo fence”: a series of measures to try to figure out if someone is replicating too fast. If the system that analyses the pattern of self-replication thinks it’s got the signature of a griefer, the functions are blocked — because “regular” use of the rezzing feature is usually not so intense. Well, what do griefers do?… they just replicate slower but use more starting objects.

On the Web-side requests, LL even went further with their paranoid measures. Here their fear was that people would use the Grid to launch distributed denial-of-service attacks on third-party servers. Imagine 25,000 sims, all launching an attack simultaneously doing hundreds of requests per second on, say, Anshe Chung’s website or Prokofy Neva’s blog (always popular targets of griefers). They would immediately crash the servers — and Linden Lab would take the blame, since the attack would have originated on their grid.

Well, to prevent this, LL limits “too many requests” per second. But they do it even more cleverly: they restrict it per avatar. So you can’t use “script farms” that way — all your objects’ requests are pooled together, and the limit is applied to all of them.

Of course, what do griefers do?… create a hundred alts per sim, and each launches their own attacks. It takes a bit more time, but… LSL programmers are used to workarounds, even dramatic ones.

So what we have in fact is a pretty simple language (anyone familiar with programming languages will pick up the basics in an afternoon), but that requires a daunting amount of workarounds until you start to be able to do something remotely useful with it. These workarounds take months and months to learn — many are “trade secrets” which you don’t usually get explained during scripting classes. Quite a lot are impossible to figure out from reading code — freebies tend to use few of those tricks, and the best scripters will not share the workarounds they managed to discover. So be prepared to face years of trial-and-error experimenting until you grasp the basics of “working around LSL limitations”. But a good and persistent programmer will eventually get there.

What this means is that LL has artificially created a huge gap. On one side we have the beginning programmers, the ones that are so happy to have created a door that actually opens and closes. They don’t care about complex things — they wish just to experiment with the simple ones. Doing things like slideshow presenters are easy to do with a few lines of code, and quickly understood by beginners.

On the Dark Side, we have the griefers, exploiting every possible hole in the architecture to, well, bring the grid down, or at least a few sims, or at the very least, annoy a few avatars. To be able to handle them, Linden Lab added a lot of complex limitations and checks so that griefers are seriously hampered in their attempts.

But are they really?… Almost everything has a workaround, it just takes ages to deal with those. So the top programmers spend ten times the normal amount of time to deal with the workarounds — but so do the griefers, who are also top programmers. Granted, the “casual user” and the “script kiddie” will probably give up quickly enough when trying to launch their Ultimate Grid Attack™, but… seriously… how many of these are around? Serious griefers are serious programmers, too; both know how to subvert LL’s limitations. They’re not really worried about how terrific LL’s limitations are: there are always ways around them, and griefers have ample time and nothing to do with it, so they can spend weeks and months figuring out a way to subvert the limitations.

In the mean time, professional scripters, earning their living in Second Life, spend countless hours hitting roadblocks and dead ends in their desperate attempts to have LSL at least do something useful. Programming time skyrockets into the unforeseen future, as clients wait for the programmers to deliver. And sometimes there is simply no way out but to go to libsecondlife and do there what you can’t do in LSL. No wonder ‘bots are more and more popular: the limitations are sometimes simply too hard to work around in LSL, or even impossible (like the famous scripts that only work when the avatar owning the object is in the same sim; if they log off, everything stops working… a silly limitation for some functions that has no plausible reason for existing, and only by having a ‘bot in the sim will things work correctly…).

Now I’m not underplaying the eternal fight between griefers and developers. There has to be a way to limit griefers’ attacks, and this mostly means making their job so hard that they give up and go elsewhere to have their laughs at our expense. But at the same time, this means that everybody else needs to suffer because of those naughty limitations…

A few, however, are really just “a whim”. I can only seriously hope that LL will start dropping those as soon as Mono is rolled over the whole grid (almost done!) and debugs it thoroughly (shouldn’t take much longer!). Others, well, they will probably never implement — like a way to get rid of the laggy animation overriders by simply adding that feature on the SL client. We’ll have to wait until someone patches the SL client to do that. Similar things will have to wait until the Mono engine allows other languages to be deployed besides LSL. Mono-based scripts are compiled server-side, so there is a higher degree of control that way — LL could, in theory, apply their clever pseudo-AI (the one that tries to identify if a running script is self-replicating itself too quickly and shut it down) at the compilation stage. This would mean that they could search for patterns of suspicious code and disallow its compilation at all — before harm is done. Think of how modern anti-virus software works: it looks for signatures (patterns of behaviour that are often used by virus software writers) to identify potential new virus. LL could do something similar: any script that tries to self-replicate too quickly could be flagged for review and not compile at all, thus keeping the Grid safe.

Granted, like in the real world, this is an arms race: griefers will try to develop new griefing tools that don’t use any known methods to force the server-side Mono compiler to validate their scripts and compile the; but after such an attack, LL would be able to see what kind of new trick the griefer has come up with, and add it to their database of “signatures” — the next time a griefer uses the same trick, they won’t be able to compile that script. More interesting is that this would not require a server patch (like it does today) — just an update on the “griefer script signatures”, which might be done in real-time (ie. all sims are forced to pull the latest database of “griefer script signatures” when compiling a new script to Mono bytecode).

And that way, most silly limitations could be lifted — and allow LSL programmers to be insanely more productive that way.

Conclusions

We live in a limited world. Sadly for the content creators, they spend too much time working around the silly limitations — most of them created at whim by some angry developer in a bad mood; a few really outdated limitations that don’t apply any more; others just the consequences of bad design; and a few just the result of putting SL to uses that LL never dreamed of.

Overhauling the whole of Second Life is not just redesigning the architecture (which is crucial, of course, since it will allow scalability and more stability). It’s taking a good, close look at how people are actually using SL today, and what their nightmares are. Content creators — but also common users! — are tied to all sorts of limiting factors in their virtual lives, many of which don’t make any sense. The struggle against all these limits is too constraining sometimes, and the last option is just to give it up.

It’s true that the most fascinating artistic creations have come from surpassing obstacles and limitations and constraints. It’s the way our minds work: human beings are problem solvers, and “life” has often be described as a series of obstacles that we have to pass, and by doing so, we enjoy a sense of fulfilment that gives us pleasure. Well, that is true; on the other hand, if the obstacles are set too high, it leads to frustration, disappointment, and, ultimately, abandoning the attempt.

Fine-tuning how much LL can limit our creativity without leading to frustration is quite an art :)

[Page 1] [Page 2] [Page 3]

Related posts

Readers who viewed this page, also viewed:

  • Yak Wise

    Gwyneth, your the best !!!

  • http://www.metaverserepublic.org Ashcroft Burnham

    An excellent article – I entirely agree. The scripting limitations in particular are no doubt wholly oppressive to the entire internal economy.

    As to the issues of prims and meshes, your solution is excellent. The question would then arise as to how these exported meshes would be counted in the prim economy. Many have advocated moving to a polygon economy, although the difficulty with that would be that it would be difficult to transition.

    A sensible solution would be to replace prim limits with rendering cost limits. The current system of avatar rendering cost (“ARC”) should be extended and applied to all objects (with adaptations as necessary). The scores generated by objects should roughly equate to their current prim values, and the limits per server should also roughly equate to the current prim limits. The uploaded meshes would have a specific cost (which might well be substantially lower than the equivalent prim cost), which would be based on textures and scripts as well as polygons.

    On the subject of scripts, it would be sensible if each script was given a cost rating based on the performance that it will require, which cost rating might be a far more effective way of undermining griefing activities than arbitrary limitations applicable to all scripts.

    Returning to server limits, one useful architecture to develop would be a system in which servers can assist the rendering of other servers, enabling estate owners to increase the sim’s prim limit one prim at a time, for a proportionate cost. Simultaneously, new rendering options could be developed to reduce the impact on client-side framerate dropouts, including a distance simpling system whereby the geometric complexity of rendered objects falls off at a variable distance depending on the overall load placed on the graphics card, and likewise with the texture resolutions.

    Scripters ought have the option of making certain scripts run client-side only (which would be useful for circumstances in which it is not important that the object is displaying the same state to all users), and, for client-side scripts, they could also be deactivated by distance in the same way.

    Estate owners ought have the option of disabling certain features on avatars with an ARC over an amount prescribed by the estate owner: for example, “if ARC > 1,500, disable all shiny effects for that avatar”. That should be scriptable so that the limit value can be changed or the feature turned on and off in computationally determined circumstances, including ones that relate to the combined total ARC value on the estate, and the total number of avatars on the estate.

    Many of the resource limitations could be reduced substantially in their adverse effect if the limited resources to which they relate were managed more efficiently, both as you suggest in your article, and I suggest above. Alas, efficient resource management does not appear to be a strength of Linden Lab (or, indeed, very many people at all).

  • http://slhomepage.com Clubside Granville

    Hey Gwyn, thanks for the post, hopefully it will educate some of the people. Your suggestion for in-world mesh creation is great, though today we need to go beyond meshes into bones and other modern 3D conventions. As for LSL, what can I say, it is putrid garbage. MONO doesn’t get to be a saviour yet, until the Second Life functions are made into a library so standard .NET languages can be used it’s still LSL. And there really needs to be a time limit on non-MONO script support since as long as the LSL engine runs the performance drag remains.

    What I really want to talk about is the simulator. You know I am a big proponent of the contiguous grid, without it you don’t have a “virtual world”, you have a bunch of 3D rooms. I won’t get into your Snowcrash analogies as it offends me each time people suggest that dude inspired anything. He wasn’t even the hundredth guy to talk about virtual worlds or avatars. And while he may have used multiples of 16, the true computer building block (HEX numericals), I would hope programmers a decade later wouldn’t be looking at that kind of math for inspiration. A connected grid is a must, it was the simplistic choices the Lindens made that got us into this mess. One side comment, a lot of your calculations are based on everyone setting their draw distance to 512 to see the entire region and that all the prims are on the ground or a level where they would be rendered, and I don’t know where you got the 5 million polygons a second figure from, since polygos are an abstract and most 3D renders are calculated by the number of triangles per second. Neither here nor there, but Second Life has usually listed a robust minimum system requirement where the listed cards draw billions of polygons a second.

    To get off the ground fast they chose as many off-the-shelf (or close to it) tools as possible. An awful 3D rendering package (ven awful RenderWare would have been a better choice), a broken physics engine (Havok 1 was never really finished for Linux, and Havok 2 was already announced and near completion with full Linux support in 2002) and who knows what for the IM infrastructure. Slap these things together and decide that one processor for one region gives them the best control of the simulation and the recipe for disaster is there from day one.

    Even this bad choice could have worked better, and I want to throw up an example before explaining the infrastructure that should have been obvious from day one. That example is a little game for last generation’s consoles called Star Wars Battlefront. Released in 2004 by LucasArts and developed by Pandemic, this game was designed for online play. While there were PS2, PC and Xbox versions, I’m going to talk about the Xbox version. The Xbox was essentially an i386 computer with a 2002 nVidia GPU, hardly state-of-the-art at the time. As for networking, Xbox Live uses P2P networking rather than a central server meaning all machines have to send avatar movements, vehicle locations, bullets and more to each other, far slower than a small upload stream to a server collecting the data and passing it down fast. Despite these limitations Star Wars Battlefront allowed the home user to host matches where 16 users fought alongside 16 computer-controlled combatants on maps (regions) four times larger or more than a Second Life region. The puny i386 was calculating AI for the NPCs, tracking mines laid, bullets shot, vehicles exploding and more. Before the old “user generated content” argument comes up, the only difference between Star Wars Battlefront’s content and content on a Second Life server is the location, streamed from disc versus streamed from network. There is no magic in Second Life’s distribution of content. So here we have an underpowered systm with no central server outperforming Second Life in essentially the same development window. Sad.

    How could it have been done? How should have it been done? I don’t want to get too technical or Prok will yell at me, but here goes. Distributed computing. They already did this with inventory but why not the simulation features. Servers designed to run the scripts communicationg with the ones running physics and others the objects on a region. When a region is unoccupied and not within anyone’s draw distance resources are shuffled to support the active regions. No pre-determined region size at all, just contiguous land and distributed simulation calculations. It would have been much cheaper as there could be load balancing and it could scale. Just as there is only one google.com there are thousands of servers in the backend making us think there is only one. Empty regions with running scripts getting fewer resources. Local cache of regions so the server doesn’t have to push all the polygons constantly with client-side occlusion. I could go on and on… unfortunately.

    Last bit from this windbag comment: the Animation Override issue shouldnt really take much to fix. Of all the assinine development decisions that need repair this one is both a no-brainer and houldn’t be much wok to implement. And client-side cache them as well.

  • http://gwynethllewelyn.net Gwyneth Llewelyn

    Thanks for the very insightful comment, Clubside!

  • Aldo Zond

    Fabulous post – i’ve learnt alot. Thanx

    Can i raise the point of gestures and chat. You can go to some places and people overdose on gestures. I’ve nothing against gestures and chat per se but it astounds me that people dont understand the effort involved in the background in punting each and every line to all the people in range of the speaker. The proximity calculations needed to work out whether an individual can ‘hear’ a line or not must be responsible for some lag on a sim.

    A 10 line ASCII art gesture in a sim of 30 people needs to make upto 300 proximity calculations (Obviously there are possible optimisations – dont know if they are being used) – shouting makes the issue worse because its over 96 meters and then theres the actual effort of pushing the lines to each client.

    lots of lag?

    Or am i missing something?

  • Allen

    Gwyneth,

    Thank you for writing all of that down. I’ve been in SL for just over a year (I’m posting as my RL avatar here), and have dabbled with building and scripting. However, I’ve never had a good grasp of what was going on. …and I still don’t, but I’m closer thanks to your post.

  • Jazzman J.

    Great post Gwyn. Like Allen above I’ve always wondered about some of this stuff and it’s very good of you to put this together.

Status Updates

 Second Life® View GwynethLlewelyn's profile on slideshare

My images

Smiling at homeWorking towards a national OpenSim gridTesting Beta 3.2.7View of Neufreistadt
Digg! Listed on BlogShares Site Meter

 

Gwyneth Llewelyn
Gwyneth Llewelyn is offline in Second Life.
Creative Commons License http://gwynethllewelyn.net is licensed by Gwyneth Llewelyn under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.