Once more I go deep down troubled waters and remain a bit away from everything interesting…
Well, at least the interesting things about Second Life® and OpenSimulator, of course 🙂
In fact, in my spare time — which is pretty much the little time I can afford to spend away from my academic research — I had to deal with the infamous WordPress Distributed-Denial-of-Service (DDoS). You might think it’s over, but no. Just yesterday, a newly-installed WP blog which was moved from a different server got hit with a gazillion requests, in a brute-force attempt to get the ‘admin’ password. All of them failed, and the WP backend didn’t even register much traffic, but the huge spike on traffic got the security team insanely worried about WordPress — since it was the only WordPress-enabled website on a rather large server.
The botnet doing these attacks cares little. Once they suspect there’s a WordPress blog hosted on the server, they will pretty much attack everything there, no matter what application/technology it’s using. Putting everything behind CloudFlare helps (and almost all sites were already protected), but it’s no guarantee that the crackers will leave you in peace. So, now you know, if you have neglected your website lately, take good care — the botnet is still around and still doing a lot of harm.
The opportunity to move my main websites out of DreamHost (no, I haven’t abandoned them; the shared server I’m leasing from them has shown stellar performance, like never before — even though my own websites have very little traffic, the awesome DH team have certainly been busy in cleaning things up) into a slice of a virtual machine gave me an opportunity to learn a lot of things. Among these, I loved the challenge to “go back to 1997” — which, back then, meant having to tweak everything to fit inside little available memory and slow CPUs, and still be able to serve dozens of concurrent users. Nowadays, every server seems to have 16, 32, or 64 GBytes of RAM — or possibly more! — and as few as 8 CPU cores, but more likely 16 or even more. Disk access is slowly moving towards SSD-only storage (like the guys from Digital Ocean are doing) — prices are getting cheaper, and that means you can spoil your customers by giving them more tech for the same fees.
This makes system administrators lazy. Oh, so Apache is spawning processes with 512 MB of RAM and eating 25% of a CPU? Who cares — we have 16 GB, and plenty of CPUs to spare. So “tweaking” the system is not an option any more. Being security-conscious is another matter, but one trades off tweaking time to get more performance for adding a few more GBytes of RAM and another handful of CPUs, and let the hardware deal with the problem. Simple. If you can afford it, that is.
Virtual machines — “slices” as some call it — are the cool new toy for geeks. Well, not really “new” — we have had virtual server hosting since 1999 or so. The advantage, of course, is giving end-users a “full” machine with root access, where you can install pretty much everything you wish, and not merely a shared environment, where the hosting provider’s sysadmins will pre-install the software and just give you a control panel to access all features. Obviously the latter are in much more demand: after all, these days, few people have time to learn how to install, configure, and maintain a full server. They just want it to work. I know a few web-designing companies, which have been in business for well over a decade, who have no clue what a “shell” is. They couldn’t care less, they just want good performance, high security, and a simple way to point and click on a panel and get things done. I can very well understand them; that’s why I love DreamHost, their control panel (which is custom-made by them, they had control panels years before cPanel or Plesk came out) pretty much does everything, and has plenty of options to keep even a seasoned sysadmin happy.
On the other side of the scale, you have the elite teams who assemble everything on their own — the kind of engineers that Google hires. Or Linden Lab. They choose their own hardware, install and fine-tune everything, and ship it to a co-location facility. They don’t trust “managed services” — they wish to do it on their own, because every application is different, and every application needs a fine-tuned operating system for running that specific application well. Of course such a service is very expensive: you need to buy your own servers, keep a team of engineers on duty 24/7, and deal with everything on your own. That costs a lot of money.
The market in the middle of the two extremes is tiny. Few geeks are willing to pay premium for managing their own servers — hundreds of dollars per month. And there are few geeks around, anyway. They play around with their own servers at home, using battered old desktops turned into Linux servers. With the rise of bandwidth offered by fibre operators, they can even set up a lot of things at home: low-traffic web servers will not require much bandwidth anyway. Even things like OpenSimulator can run from a home server and accept a few connections from outsiders (more on that below); a lot of network games with free servers are run that way world-wide.
But often that’s not enough. Upstreaming service from home is usually not very good; naturally enough, the fibre and ADSL operators are not happy with “home hosting”, since they usually sell that kind of service from their own data centres. On the other hand, if you have a very specific application — say, one that does not run on Apache + PHP… OpenSimulator, again, comes to mind, but so does Diaspora*) — it means that you will definitely need your own server.
Well, the prices for virtual machines have dropped recently. It’s not hard to understand why. If you buy in bulk, you can get, say, a 6-core server with 6 GB of RAM for less than 200 dollars/month. What a “slice” provider will do is to allocate that server to 12 customers, giving them one CPU and 512 MB of RAM — for perhaps $20/mo. But a server with twice the RAM and the CPUs will not cost twice more — so, as memory and CPU increases in density, virtual machines drop in price. And the drop has been dramatic.
Let’s take DreamHost as an example. My “unlimited everything” account (i.e. unlimited websites, unlimited disk space, unlimited users, unlimited email accounts, unlimited databases, unlimited traffic…) with them costs less than $10/month. That’s not too expensive for an “unlimited everything” account — you can get way cheaper service, sometimes for as little as $1/month, if you are willing to accept a few limitations. But, of course, there is a catch: your account will be shared with dozens of other users. On the server I’m on DreamHost, there are around 2,000 websites, all competing for a bit of CPU and RAM, all the time.
DreamHost also sells what they call a “Virtual Private Server”. This will allow you to reserve, say, 256 MB of RAM just for you, and a percentage of the CPU — you can use more, but that won’t be guaranteed. But that little bit is definitely “yours”. And, as a bonus, they allow you to add a lot of new functionality which is not allowed on the simple shared service. But you don’t get a root account; the DreamHost engineers think that most people don’t need that anyway, and prefer to sell customers a more complex control panel with even more options. And, of course, they’ll charge you some extra $15/mo. for the privilege. This is not a new service, btw, they have been selling that for several years now, because they managed to compete with other virtual server operators, who still are offering service for $40-50/month.
But now, with cloud computing becoming wildly popular, prices have fallen even more. There are many reasons for using cloud services, but I’m not going into that; needless to say, most of those services are more expensive than virtual servers. Why? Because they give you extra features. Like the ability to spread your servers around several datacentres, making them 100% available. Or the ability to clone them on-the-fly to handle more traffic. We have to thank Amazon for coming up with the model, but the truth is, these days, there are plenty of open source virtualisation/cloud solutions: all you need is some hardware to run, and you can set up your own Amazon competitor in next to zero time (and be even fully compatible with Amazon’s own API!).
So what some operators are doing right now is to convert all their hardware to run under a cloud management hypervisor — a nifty piece of software which is aware of all the hardware that is under its control (it can have wildly different specifications) and what software is running on it. What the hypervisor does is to dynamically allocate the best hardware to run your virtual server. If it has low traffic and is using few resources, it gets shuffled to a lower-end physical server, probably shared with many other virtual servers. If it gets hit by a traffic spike, it might get “promoted” to a high-end server for a while, and running fully at top performance, until the traffic goes back to normal levels. This is pretty much what Amazon’s servers are doing all the time. The beauty of this approach is that, as new hardware is put into place, the old one does not need to be thrown out of the window — it can still be used as part of the cloud, thus expanding its ability to serve more customers (until the hardware reaches its end-of-life cycle — but even if the physical server fails, the hypervisor will be able to shuffle the virtual servers to other machines and just flag the failing server as “unavailable”). This is incredibly resilient, but it also uses existing resources most efficiently.
To go back to my old example — running the Second Life Grid®. At any point in time, perhaps a few hundred regions are crammed full with avatars. But the rest just has less than a handful visitors. Most are empty. So the hypervisor would just shuffle the high-traffic regions on simulators to the high-end servers, while keeping the remaining sims running on lower-end servers. As Linden Lab adds more powerful hardware, the “old” servers can still be fully used, and continue to work perfectly well — instead of the current “Russian Roulette” approach, where you never know where your region’s sim is going to end up, and it requires a Concierge to reboot it and hope it gets allocated to a different, more powerful server.
So in the end, with a very heterogeneous grid, service is provided on a statistical base. The old calculations — add the number of CPUs and RAM, divide by the number of users, add your margin — do not apply to these business models. Basically you have a huge server farm with different types/kinds of servers. Then you have a gazillion virtual machines being shuffled around. All this costs a fixed amount of money per month in co-location facilities. The trick is to figure out how much to charge, knowing that it’s impossible to predict what kind of customers you’ll get (you might get the next Zuckerberg leasing a virtual server…). But of course you can do an estimate, a prediction, based on statistics — say, the probability of selling a server to the next Facebook is tremendously low, while the probability of getting someone to host a website called “What I did in my vacations” (number of followers: 3, who log in once per month) is extremely high.
Juggling with numbers, what this means is that these new generation of virtual machine providers have come up with incredibly low prices. Because the market for geeks is small, and the probability that any one of those geeks actually requires a lot of performance, they can give them stellar performance in a cramped environment, and still make a profit — charging them less than $10 a month for the privilege of having their own server with root access.
There is even another positive twist on this. Because those geeks know they will have just a tiny slice of what a “real” server is, they will have to tweak it to extract every erg of performance they can. But this means that they will have to opt for extremely efficient solutions — which, in turn, will not be “heavy” on the overall infrastructure. The more geeks they get, the more efficient they will use their available infrastructure, and that, in turn, means the ability to lower prices even further.
Whole websites are devoted to discussing tweaking tips to get the best performance out of such virtual servers. Obviously there is not a one-size-fits-all solution: it depends a lot on what you’re actually trying to accomplish. But in 1997 a one-CPU server with 512 MB was a respectable piece of equipment (I remember a mail server with those specs which was used to handle 20,000 mailboxes or so…), if well-tuned. These days, applications and server software is incredibly bloated, but there are lots of tricks to exploit. And, for me, it was great fun to do so, throwing the clock back a decade and a half, and seeing how many simultaneous connections I could cram in the tiny slice, without ever hitting swap disk. It was an education 🙂 The results will probably be published by a popular tutorial site — I’ll announce that, once the article gets approved.
After crossing my eyes over so many WordPress things, I even ended up updating some of my WP plugins and even submitting some code to other people’s plugins. Not necessarily because of security features, but because WP changes over the years, with things becoming obsolete and new features becoming mainstream. Some of the plugins were struggling to keep up or even stopped working altogether…
When I was happy with the results of that virtual machine, I went back to the old 2006 ‘server’ (a retired HP desktop computer) which powers our OpenSimulator grid. As I’ve written before, we had to abandon the idea of leasing a server for that, because it was simply too expensive. OpenSim consumes a lot of CPU and memory — around 1.2 GBytes of RAM for each OpenSim instance and some 30-50% of a 3 GHz CPU. There are obviously a few tricks that can be done: compile less modules, shuffle regions around on each instance, so that the busiest (more prim-intensive) ones are spread around, keeping neighbouring sims running in the same instance, and so forth. But, ultimately, OpenSimulator is a resource hog, there is little one can do about it but throw hardware at the problem. So, we ended up by using Kitely for the production environment (the one that can be visited) and the old server — where memory is cheap to add — for development. Every time a content development cycle finishes, the new information is published (by exporting an OAR file) to Kitely’s cloud.
What hurt me most was that we couldn’t afford a better server, and running OpenSim was not the only thing there. Groups, IMs, Profiles, and assorted things (namely, part of my academic) work run on external web services, handled by the usual Apache + PHP + MySQL combination. When we had a full server hosted on a data centre, most of these services were handled… by DreamHost, since network connections between data centres are reasonably good these days. But for a SOHO environment, this is not the case, and I really wasn’t very impressed with all the network delays in communication. So, well, the box had to additionally run Apache + PHP, which would compete with the OpenSimulator instances for resources. Not a great idea. Ideally, we should have a second server just for the web things, but, alas, we haven’t anything left to run them.
So I’ve ended up with getting the configuration used on the tiny virtual machine and implement it on the OpenSim grid server. After all, it would get far less traffic than any of my blogs — but for OpenSim, it was crucial that it got a fast response to all requests. And fast it is! My current configuration takes so few resources that the web server (no, I’m not using Apache) does not even come up on the list of top 50 processes. And it’s blindingly fast, too. A pity that MySQL — which is shared with OpenSim — cannot be much tweaked, since OpenSim needs it a lot and it requires a completely different configuration: one that uses and abuses memory to keep good and fast caches.
But at least all web communications external to OpenSim can now run “instantly”, without significant delays. And, as a bonus, all those heavyweight OpenSim instances have a bit more memory and CPU to run — not enough to be noticeable for the poor content creators, but certainly the server is far less stressed to serve all those requests. It’s not as fast as Kitely 🙂 (yet!) but it’s quite adequate for the few people who are logged in and working on the grid… and yes, there is even a WordPress-based site running there, too!
One day, someone with plenty of free time and lots of financial resources will possibly rewrite OpenSimulator from scratch. One good thing is that at least the central servers can be handled completely separately from the simulator instances — and things like Simian’s central server, for example, run on PHP. We have to thank Linden Lab for that: as they moved the inter-sim communications to HTTP, the OpenSim crowd did the same, and, as such, all requests for login data, textures, assets, and so forth, are pure HTTP calls — which can be handled by something much lighter than Mono. Thanks to the modular approach of OpenSim, this might go a bit further: there was a decision, for instance, to put things like Groups, Profiles, Search, Offline IMs, etc. completely out of the OpenSimulator code, so it can run on superfast web servers instead. It drove me to insanity to change all those modules — which were formerly compiled as part of the OpenSim code — but now I understand the wisdom of the decision: the more things that can be run outside OpenSim itself, the smaller the instance becomes, and the faster it can be served externally.
Apparently, if I got this information correctly, some of the OpenSim core developers are thinking to go back to the monolithic approach, because those external modules are extremely hard to configure and maintain. Specifications change all the time, and they change between versions, without documentation — it’s up to the end-users to figure out on their own what the webservices are supposed to return, by looking at the convoluted code and trying to make sense out of it. So I’m not sure what the future will be, but, in my opinion, pushing things out of OpenSimulator was a nice trend which I hoped that it would continue. After all, the beauty of web services is that you can handle them so efficiently — as Linden Lab is realizing, since, ultimately, their goal will be to have 90% of all the requests handled by webservers running out of cheap Amazon S3 virtual boxes on the cloud, and only let simulators do the few things that are impossible to do elsewhere. I’m sure that OpenSimulator should follow that trend.
At the end of the day, it’s perhaps hard to expect OpenSim instances fitting in a handful of MBytes — like we certainly can do for whole webservers. But there is some light ahead in the tunnel. The way I see it, the SL/OS communication protocols are not so much different than, say, Diaspora, or StatusNet/identi.ca, or pump.io… or even old Jabber. All these run on web services and can be efficiently handled by web servers. OpenSim should be the same.
A last note: one might wonder if it’s not possible to run OpenSimulator instance on virtual machines. Well, it’s certainly possible, and a lot of people do that to keep running costs low. The problem is really how many resources they consume, all the time, even when the grid is totally empty. But people like Kitely have certainly proved that it’s possible to run a whole grid inside the virtual space of cloud computing — and still get excellent performance. It’s just very hard to do it right — after all, Kitely took several years to develop.
And I wondered a bit about what Philip Rosedale means on his High Fidelity website when he talks about peer-to-peer computing. Maybe he’s planning of deploying a huge cloud running on people’s homes — each sharing a slice of their CPU and RAM towards a fully distributed grid, which would shuffle ‘scenes’ around, depending on location and available resources on each user’s computer. It’s a thought. After all, there are plenty of solutions around to spread the load of intense computing to people’s home desktops, like the famous BOINC platform which is used by SETI@home, among many others…