OpenSimulator: Multiple instances and monitoring


Recently I was “forced” (in the good sense of the word!) to get back to work on Beta Technologies’ own OpenSimulator mini-grid. It usually remains pretty much empty (and open to attacks from, uh, “visitors”) most of the time, ready to be used when something comes up — like a customer, some work for my PhD, or even some friends needing a temporary setup with a few islands to experiment things, and having no patience or time to set up their own environment. It’s not a “public” OpenSim grid by any means; on the other hand, it’s also not a “public sandbox” but mostly a “storage” of old projects and ongoing ones.

And the server lacks the ability to run everything smoothly. It’s actually an old PC who was used for a while until it became too obsolete for running the Second Life® Viewer. Now it gets an extended life as a Linux server running an OpenSim grid with 22 or so regions, behind an ADSL connection with poor bandwidth. Not exactly the best of circumstances — but it tends to work.

Somewhat.

So, bear with me as I describe a very technical and boring way to deal with all this 🙂 Maybe — just maybe — this might be useful for someone else who is running their own OpenSim grid.

From single-instance to multiple-instance

After the last project we had — during which I don’t do any updates or upgrades or tweaks, or the content creators will flog me to an inch of my life 🙂 — I decided I should step up from OpenSimulator 0.7.2 to 0.7.3.1, the (current) stable branch. It brings a few improvements regarding meshes — and some configuration changes.

Upgrading OpenSim to a new version is always a small nightmare. You have to remember that the vast majority of OpenSim developers don’t actually use it in a production environment — they just back up some sample sims of different sizes, delete everything, and start from scratch, and then retrieve their backups. When on a production environment, this is not feasible: beyond prims, there is also user data, profiles, groups, and inventory to save. So this means that you’ll have to keep your database running and just upgrade it. Fortunately, since the very beginning of time, OpenSim checks the database and adds any extra features that are required to get it going. Sometimes this can be dramatic — I remember the switch to 0.6.7 to 0.7! — but usually it’s just a few extra columns on some tables, which get updated automatically.

No, the major problem is usually with the configuration files. The defaults change from version to version. So you can’t simply get your “old” configuration files and just drop it in the new environment: names might change, what used to be the default is now turned off, new options have been added, old options are made obsolete. So what one has to do is to open the old configuration files — one by one — and check the new ones manually, line by line, to make sure you have covered everything.

This is tedious work and prone to errors (there is always a file you forgot, or a new set of defaults that will make your new configuration fail utterly), and that’s one of the reasons why the Aurora Sim Team has provided an alternative OpenSim distribution which makes maintenance and configuration way simpler. Unfortunately, Aurora’s database is incompatible with the “core” OpenSim distribution, so I have never “migrated” — and have to patiently deal with OpenSim’s cumbersome configurations.

Cumbersome… yes, but also flexible. You’ll see why in a minute!

After upgrading to 0.7.3.1 everything seemed to work well… but suddenly I noticed that the server load was about 10 times higher than before! Memory seemed to be at a premium, and that pushed me to go to the OpenSimulator Mantis bug tracking website in search for similar complains and possible solutions. Alas, the problem is more complex than I thought. But I had no idea of that back then.

Some people suggested simply to split regions among multiple instances. Up until now, I just had two Mono processes running: one was ROBUST (that’s the central servers, dealing with assets, inventory, logins, and so forth), the other was OpenSimulator itself, hosting all the regions for our grid. My theory was that since the grid is so little used, it would be pointless to split the regions among several OpenSim processes. Well, expert OpenSim administrators told me otherwise. Like Linden Lab does split regions among several (virtual) servers, the same should be done under OpenSim as well — this will conserve memory and make each individual instance more responsive. An extra bonus is that when one region crashes the simulator software, not all regions will be down: only those hosted on that particular instance.

These seemed to be compelling enough arguments. Anything that conserved memory on an overloaded CPU would make sense for me, so I got my favourite text editor (Coda 2 for the Mac; version 2 has just been recently released) and started hacking at configuration files.

And reading installation instructions. I quickly came to the conclusion that OpenSim, by default, doesn’t make it easy to split regions across instances. What the OpenSim experts do is simply copy & paste the whole directory where OpenSim resides several times, and launch OpenSim from each, placing different regions on each directory.

Well, you can immediately see the maintenance problem this has. When you do a fresh install, obviously this is the easiest way to split your grid across several installations. But what about upgrades? Or if you add a new module (say, for economy, or group management…)? You have to go into each directory and configure everything separately. This will spell trouble — sooner or later, the configuration will be “out of sync” because you forgot to tweak one line on a 1,000-line-configuration-file, and suddenly one instance or the other will not work. Worse than that, if you compile OpenSim in different directories, from the perspective of a system administrator, it’s a different application. While many libraries and DLLs are common — and thus shared across instances — operating systems are better sharing code with multiple instances of a single application than across “similar” (but not exactly the same!) applications.

So I looked into this and tried to figure out if anyone managed to run multiple instances from the same directory. Apparently, nobody does — or, if they do, they’re silent about it. I seriously suspect that all commercial OpenSim providers do it, but they tweak the code to their own purposes and don’t talk about it. They also upgrade their own grids very little, because it’s such a mess upgrading everything.

Back to the drawing board, I started to look at how OpenSim loads all its configuration files these days. In fact, the core developers are clever. They start launching one file, OpenSimDefaults.ini. This provides “reasonable” default configurations. That’s why you can start a standalone sim on your own desktop without a configuration file — the defaults are there to let you start running OpenSim immediately. Then you add all your “overrides” on OpenSim.ini. This means that in theory you should never touch OpenSimDefaults.ini (it gets updated with each successive version anyway) and just make whatever changes you need on OpenSim.ini. Later on, OpenSim will look for extra configuration files for your externally compiled modules and add them as well. And — this is the cool bit! — you can actually tell OpenSim where to find the configuration files with a command-line option.

Thanks to this, my problem now seemed to be simple: I would just need to have a different OpenSim.ini for each instance, and load them appropriately from the command line.

Well, not really. OpenSim.ini is a huge file (even though it got shortened since the developers rely on the reasonable defaults set on OpenSimDefaults.ini). The problem in keeping separate copies of huge files is that they tend to quickly get out of sync, too. And in truth I just need to change very little for each instance. In fact, I just need to change two things:

  1. The port it listens to for external TCP connections (http_listener_port)
  2. The location for the region files for this particular instance

It seems to be a waste to have several duplicates of a 920+-line file around for just those two changes!

Now here comes the cool bit. There is actually a way to have OpenSim load OpenSimDefaults.ini, OpenSim.ini, and extra .ini files! This is documented on the page for configuration options.

So here is what I’ve done. I’ve created a new directory under …//bin, where OpenSim.exe and OpenSim.ini are, and called it grid. Under that directory, I created new directories for each instance. Let’s call them instance01, instance02, instance03 and so forth (I personally prefer to give them names related to their Estates).

Then each of those instanceXX folders gets its own OpenSim.ini file, and a new folder called Regions. You’ll see why.

The ../grid/instanceXX/OpenSim.ini file is very simple, it just needs the following lines:

[Startup]
regionload_regionsdir = "//bin/grid/instanceXX/Regions/"
[Network]
http_listener_port = 90XX

Note that the http_listener_port has to be unique across instances. It’s unrelated to the individual regions’ port: this one is TCP (for HTTP requests), while the regions use UDP to communicate among each other. If your grid is open to the public, don’t forget to add the extra ports on your firewall (I forgot and was surprised that “nothing worked!”)

Now it’s time to split your regions among the many instances. Just copy & paste the region information from Regions.ini into each //bin/grid/instanceXX/Regions/Regions.ini file. (If you’re like me, and were still using the old XML region configuration files instead of the new Regions.ini, it’s simple: just start with an empty Regions.ini and launch OpenSim as described below. OpenSim will ask for the first “new” region — just feed it the UUID you already have from the XML configuration file, and it’ll retrieve the assets for it and write a new Regions.ini file for you. Then you have to add further regions using the create region command. Again, the trick is to feed it existing region UUIDs and copying the relevant data from the XML files — e.g. name, location, etc. It’s a bit tedious, but it will work, and you won’t lose any data.)

At the end, you’ll have something like this:


|
+-- bin
|
+-- OpenSimDefaults.ini (don't change this one — ever!)
+-- OpenSim.ini ("master" configuration file common to all instances)
+-- grid
|
+-- instance01
|   |
|   +-- OpenSim.ini
|   +-- Regions
|       |
|       +-- Regions.ini
|
+-- instance02
|   |
|   +-- OpenSim.ini
|   +-- Regions
|       |
|       +-- Regions.ini
|
...

etc. and so forth until the last instance.

Now the command line magic. Under Unix (Linux/Mac) you can start the console with

cd /bin
screen -S instanceXX-d -m -l mono OpenSim.exe -hypergrid=true -inidirectory="./grid/instanceXX"

… if you’re running a hypergrided grid (it’s my case); if not, leave -hypergrid=true out.

You can check if everything is working as you call each instance separately because they will say at the top of the console (or on the log):

2012-05-30 17:51:41,390 INFO - OpenSim.ConfigurationLoader Searching folder ././grid/instanceXX for config ini files
2012-05-30 17:51:41,413 INFO - OpenSim.ConfigurationLoader [CONFIG]: Reading configuration settings
2012-05-30 17:51:41,414 INFO - OpenSim.ConfigurationLoader [CONFIG]: Reading configuration file /bin/OpenSimDefaults.ini
2012-05-30 17:51:41,529 INFO - OpenSim.ConfigurationLoader [CONFIG]: Reading configuration file /bin/OpenSim.ini
2012-05-30 17:51:41,597 INFO - OpenSim.ConfigurationLoader [CONFIG]: Reading configuration file /bin/grid/instanceXX/OpenSim.ini

As you see, this is the expected order we wish OpenSim to load our configuration files. First, the overall defaults for this version. Then our common, “master” configurations. And finally, the changes for each instance.

Simple!

Now if the OpenSim developers launch a new version, all you need to do is to copy the grid configuration over, and launch all the instances with the new version. Assuming the core developers don’t change the command-line settings, this configuration should work… forever. Heh!

Splitting and rotating logs

All right, so the above solution is nice, but not perfect. Suppose that you wish to get separate logs for each instances, instead of throwing them all inside the single OpenSim.log log file. There are some good reasons for that: on underpowered servers (somehow, in my two decades of Unix administration, I only get hold of old, underpowered servers, who knows why…), having lots of processes writing to the same file is not an insanely good idea, since each process will have to lock the file for writing, and release it to the other processes when finished — forcing them all to wait until that happens. Oh, sure, I know I’m talking about nanoseconds here. But the slower the server, the more those nanoseconds add up. This is the reason why modern multitasking operating systems use the syslog facility to write to log files that are used by a lot of processes… but I digress. The point is, Mono, true to its Windows heritage (gasp!), does everything differently. It has its own library to deal with log writing. We’ll have to tackle that instead.

Also, the more this file grows, the more effort the kernel will have in appending lines to it. Again, on superfast computers with lightening-speed RAID arrays, you probably won’t see any difference writing to a 100 byte file or a 100 TByte file. But on old, underpowered servers, this will make a difference. Again, modern Unixes have another system tool to deal with that — Linux mostly uses logrotate, FreeBSD and Mac OS X use newsyslog (there might be a few more). The principle is the same: you kill the application writing to the logfile, archive the current logfile (say, appending a .1 to it, or even compressing it to save disk space), create a brand new logfile, and re-launch the application. This can easily be accomplished by working together with cron. Those services then have further options to say how often the check will be performed, if it should happen every day or only when the file hits a certain size limit, and so forth. In most cases this is acceptable behaviour for stateless applications.

What are stateless applications? Well, pretty much everything that doesn’t require knowing “what happens to my data when I kill the application?” Web servers are a typical example: you request an URL and it’s an atomic operation — you get a result. The next request may come from the same browser or one that is completely different, but the web server doesn’t care. It handles each case atomically and independently of the other (that’s why we need to save session data in cookies or on backend servers to make a web-based application persistent). Mail servers work similarly. In fact, most Internet-based services actually work that way. The ones offering persistent services that are not stateless usually store things on a database server. Of course, the database server itself should not be shut down just to rotate a logfile — and neither should OpenSim!

Enter the fantastic world of log4net. This is a framework created by the Apache foundation (aye, the same guys that give us that fantastic web server which powers something like 35% of the Web or so) which deals with configuring how the logs should be written out to disk (or other facilities) on Mono/.NET applications. Because the OpenSim core developers included log4net on OpenSim, you can tweak it in order to get full log rotation for the multiple instances, without the need of relying on external tools like logrotate or newsyslog — and, most importantly, you won’t need to restart OpenSim every time the log reaches a certain limit.

I’m giving here an example on how to tweak OpenSim to write a new log as soon as the current one hits 1 MByte, and keep the last two logs as backups. You can accomplish way more complex things with log4net. Fortunately, the good OpenSim core developers have provided detailed instructions on how to do this.

Basically, all is done through a little-known configuration file called OpenSim.exe.config. This holds mostly the configuration (in XML) for logging. So what you need to do is to copy this file over and over again to each /bin/grid/instanceXX/ directory. Then you have to edit each one to reflect your current configuration.

About line 21 (on OpenSim 0.7.3.1) you will have something like:







This is the standard behaviour: append new log lines to the file OpenSim.log and let it grow “forever”. We want to rotate logs instead, so we’ll use a log4net module called RollingFileAppender instead of the LogFileAppender used as default. So edit this section and replace it by something like this:









Note that I’ve changed the configuration shown on the OpenSim Wiki to place the logs for each instance on a separate file (OpenSim.instanceXX.log).

A bit below on the very same OpenSim.exe.config, under the  section, change

to

So now your directory layout should look something like this:


|
+-- bin
|
+-- OpenSimDefaults.ini (don't change this one — ever!)
+-- OpenSim.ini ("master" configuration file common to all instances)
+-- grid
|
+-- instance01
|   |
|   +-- OpenSim.ini
|   +-- OpenSim.exe.config
|   +-- Regions
|       |
|       +-- Regions.ini
|
+-- instance02
|   |
|   +-- OpenSim.ini
|   +-- OpenSim.exe.config
|   +-- Regions
|       |
|       +-- Regions.ini
|
...

That’s it! Relaunch your instances, and now you should get an extra line on the logs at the very top (after the launch) showing:

2012-05-29 09:37:39,969 INFO  - OpenSim.Application [OPENSIM MAIN]: configured log4net using "./grid/instanceXX/OpenSim.exe.config" as configuration file

This should show that everything has been loaded nicely 🙂

Starting up, shutting down and monitoring your instances

Ok, let’s face it: system administrators are lazy. The last thing you want to do is to spend your time watching logs and consoles, see if a sim crashes (or a whole instance!) and restart it from scratch. In my personal case, while struggling with a huge increase in memory consumption and CPU load when switching from 0.7.2 to 0.7.3.1, I really also wanted to get a way to get alerts if something is seriously wrong with the grid, and relaunch OpenSim when it crashes and so forth.

There are a gazillion Unix utilities for doing all that and much more, many of which commercial, and a trillion more that are free and open source. I’ll stick with a relatively simple one, which is called monit.

Monit is a relatively simple, rule-based system. You define a few rules of what should be checked and what happens when the check fails. At its simplest level, it can ping a server for you, and tell you if it’s responding. If not, it emits an alert. You can define what happens with the alert — log it to a file, send you an email, and so forth. You can also define what happens with multiple alerts — e.g., shutdown the machine if nothing replies any longer, for example (a bit drastic, but possible!). Monit also works at the application level: so you can do an HTTP call to a webserver and not only ping it, but also see if it replies to a request as you expect (say, check for a web page that should be there, and send you an alert if it isn’t). Since it can pretty much call everything, and not merely “send alerts”, you can get it to do more complex things: for example, test if the database server is replying to queries inside a predefined time frame, and, if not, attempt to repair/optimise tables; if that fails, restart the database server.

There also are a lot of “predefined” tests that you can do, like checking for memory consumption per monitored process, CPU consumption, or overall CPU load. All that gets on a nice control Web page (password-protected), where you can also manually stop/start services. And if you’re monitoring a whole array of servers — a huge grid, with multiple servers and multiple instances per servers — you can use M/Monit, a “master” monit that monitors all other monits in a network from a single page. Very cool and simple to use.

Before you start commenting that package X does all what monit does and much more… rest assured, I’m aware of many of those packages. My choice of monit is not because it’s the “best”, but because the rules are very easy to write, and the interface is designed for stupid and lazy people like me 🙂

Apparently commercial OpenSim developers also use monit, so I took a peek at some configuration suggestions that they’ve made. The principle is simple:

  1. First you need to let monit know the Process ID (PID) of each instance, so that it can restart it if needed.
  2. Then you have to turn on some very simple statistics on each instance (OpenSim has lots of ways to give remote access to statistics), so that monit can check if the instance is “alive”. Dave Coyle suggests turning on JSON-based statistics (they are very compact and presented very quickly) and allow monit to check for SimFPS. If SimFPS is at zero, it means that this instance is probably dead.

We’ll use Dave Coyle’s suggested monit for opensim package but add a few tweaks to deal well with multiple instances of OpenSim.

The first thing is to go back to each OpenSim.ini configuration file for the instances. We’ll need to add two new lines. The first will save the instance’s PID to a file (which we’ll feed to monit later). The second turns on JSON-based statistics. So now your OpenSim.ini will look like:

[Startup]
PIDFile = "//bin/grid/instanceXX/instanceXX.pid"
regionload_regionsdir = "//bin/grid/instanceXX/Regions/"
Stats_URI = "jsonSimStats"
[Network]
http_listener_port = 90XX

Not too bad. Notice that traditionally, PID files are written under /var/run/, at least on Debian-inspired Linuxes and BSD-inspired ones (like Mac OS X). I had a problem with Ubuntu: since OpenSim is running under its own user (opensim in my case), this directory has no permissions to write. Of course I can add the directory and change ownership. The trouble is that Ubuntu, on a system restart, will delete /var/run (makes sense, since no applications are running when it boots 🙂 ). This can be changed somewhere, somehow, but I was lazy again, and just kept the PID files under a directory where I’m sure that OpenSim will be able to find them and write to them.

Now we come to monit. You might not be too surprised that these days, like most applications (), monit has a “master” configuration file on /etc/monit/monitrc, where reasonable defaults are set, and then you can add further configuration files for each application under /etc/monit/conf.d. This is what I’ve used.

First, let’s start with /etc/monit/monitrc.

I want to be able to administer monit via a Web interface (cooler to do!) so I have uncommented the following bits:

set httpd port 2812 and
allow admin:

Then, since my server has been misbehaving lately (I’m still discussing on OpenSimulator’s Mantis what is causing the high memory consumption and high CPU usage), I’ve also made sure I get some alerts for high resource consumption:

  check system rivendell
if loadavg (1min) > 6 then alert
if loadavg (5min) > 4 then alert
if memory usage > 95% then alert
if swap usage > 25% then alert

You should adjust that for your own server. Load average over 4 or 6 is really way too high!

And finally, I also wanted email to be sent every time an “alert” is triggered. To do that, just uncomment the line which says:

  set alert [email protected]

As a side-note, if you’re running your server behind a NAT firewall, you have two choices. The first is to send all emails through your corporate/campus email server. That’s probably best and easiest. You can see how it gets configured using this example — in my case, I’m using a configuration to send mail via Gmail’s own servers:

  set mailserver smtp.gmail.com port 587
username @gmail.com
password ""
using TLSV1
set mail-format { from: @gmail.com }

This works like a charm. I actually don’t use Gmail because I want to use the same account for monit and inside OpenSim itself (which allegedly doesn’t support TLS or SSL authentication, which are mandatory for Gmail). Also, I’m paranoid about leaving sensitive password (like my own Gmail password!) around in plain view on a configuration file 🙂 Of course, you can simply register a new Gmail account just for monit.

If you’re very brave, you can run your own local mail server. On Ubuntu, the recommended mail transfer agent is Postfix. However, I think it’s not worth the time and patience to run a mail server, deal with spam and inter-server authentication and so forth. Gone are the days where every Linux box on the planet ran their own mail servers 🙂

If you’re running MySQL and possibly Apache (for extra grid services like groups, profiles, IMs between grids etc.) on the same box, you might wish to add the following files under /etc/monit/conf.d:

/etc/monit/conf.d/apache.conf:

# Monitoring the apache2 web services.
# It will check process apache2 with given pid file.
# If process name or pidfile path is wrong then monit will
# give the error of failed. tough apache2 is running.
check process apache2 with pidfile /var/run/apache2.pid
#Below is actions taken by monit when service got stuck.
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
# Admin will notify by mail if below of the condition satisfied.
#if cpu is greater than 60% for 2 cycles then alert
#if cpu > 80% for 5 cycles then restart
if totalmem > 400.0 MB for 5 cycles then restart
if children > 250 then restart
#if loadavg(5min) greater than 10 for 8 cycles then stop
if failed host 127.0.0.1 port 80 protocol HTTP then restart
if 3 restarts within 5 cycles then timeout
group server

These are pretty much standard tests for Apache, which I’ve tweaked slightly to better reflect my own server’s setup.

/etc/monit/conf.d/mysql.conf:

check process mysql with pidfile /var/run/mysqld/mysqld.pid
group database
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
if failed host 127.0.0.1 port 3306 then restart
if 5 restarts within 5 cycles then timeout

All right. Do a service monit start and you should be able to log in to http://:2812/ and see your monit already doing something 🙂

Now it’s time to tackle OpenSim. This is not going to be easy, so bear with me for a moment.

My source of inspiration, again, was Dave Coyle. Dave actually uses an external script to launch OpenSim (and stop it), so that monit just calls that script on start program and stop program. All very nice but it was designed to work for single-instance configurations — or, if you wished, you could just make copies of that bash script for each instance.

As said, I dislike the idea of having lots of junk around with “similar” configurations, so I’ve tweaked Dave’s script to deal with all instances from the same location and take into account my “special” configuration (i.e. instances launching their specific .ini files and region files under ./grid/). Also, I rather fancy using screen to be able to log in directly to each console. So, in my proposed configuration setup, each instance gets a screen console named after the instance itself.

Here goes Dave’s script:

#!/bin/bash
#
# for usage, run without arguments
#
# see ../README for setup instructions.
#
# Original code by Dave Coyle (http://coyled.com)
# Tweaks by Gwyneth Llewelyn (http://gwynethllewelyn.net/)
# Requires bash 4
# The original script assumed that you'd be running one sim per instance,
#  and launch a different script per sim.
# These changes assume you have multiple instances with multiple sims,
#  and that instance names are launched all from the same place with
#  an unique identification for each
# List of valid instances. Make sure you add all of your instances here
declare -A instances
for index in instance01 instance02 instance03 instance04 instanceXX
do
instances[$index]=1
done
show_help() {
echo "usage: ./opensim {start|stop|restart|console} "
echo -n "    where  is one of: "
echo ${!instances[*]}
}
check_user() {
if [ $USER != 'opensim' ]; then
echo "This must be run as user opensim"
exit 1
fi
}
setup() {
if [ ! $1 ]; then   
show_help
exit 1
else
SIM=$1
fi
if [[ ${instances[$SIM]} ]]; then
MONO="/usr/bin/mono"
OPENSIM_DIR=""
PID="$OPENSIM_DIR/bin/grid/${SIM}/${SIM}.pid"
SCREEN="/usr/bin/screen"
GRID_DIR="./grid"
# set GRID_DIR to the subdirectory where your individual
#  instance configuration is
else
echo "Sorry, I've never heard of sim ${SIM}.  Exiting."
exit 1;  
fi
}
do_start() {
if [ ! $1 ]; then
show_help
exit 1
else
SIM=$1
fi
setup $SIM
check_user
cd ${OPENSIM_DIR}/bin && $SCREEN -S $SIM -d -m -l $MONO OpenSim.exe \
-hypergid=true -inidirectory="$GRID_DIR/$SIM" \
-logconfig="$GRID_DIR/$SIM/OpenSim.exe.config"
}
do_kill() {
if [ ! $1 ]; then
show_help
exit 1
else
SIM=$1
fi
setup $SIM
check_user
if [ -f $PID ]; then
kill -9 `cat $PID`
else
echo "Sorry, ${SIM} PID not found."
exit 1
fi
}
do_console() {
if [ ! $1 ]; then
show_help
exit 1
fi  
setup $1
cd ${OPENSIM_DIR}/bin && $SCREEN -S $SIM -d -m -l $MONO OpenSim.exe \
-hypergrid=true -inidirectory="$GRID_DIR/$SIM" \
-logconfig="$GRID_DIR/$SIM/OpenSim.exe.config"
}
case "$1" in
start)
do_start $2
;; 
stop)
do_kill $2
;;
kill)
do_kill $2
;;
restart)
do_kill $2
do_start $2
;;
console)
do_console $2
;;
*)
show_help
exit 1
;;
esac

Make sure you make this script world-executable (e.g. chmod a+x /usr/local/bin/opensim). Now switch to the user that you’re running OpenSim under (in my case it’s opensim; if not, you’ll have to change this on the script as well) and try out:

/usr/local/bin/opensim start instance01

If all went well, screen -ls should now show you instance01 available and allow you to connect to it.

This is the time to test starting and stopping instances to make sure you don’t have any errors!

Now let’s automate it. Go back to /etc/monit/conf.d and create a new file, opensim.conf, inside it:

/etc/monit/conf.d/opensim.conf:

# manage the OpenSim process for Your Sim
#
# usage:
#     monit start your_sim
#     monit stop your_sim
#     monit restart your_sim
#
# see 'daemon' setting in /etc/monit/monitrc for the cycle length.
# on ubuntu/debian, this is overridden by the CHECK_INTERVALS var in
# /etc/default/monit .  the below assumes you've set it to 30 seconds.
#
# if process dies, will restart sim within 30 seconds.  if process
# dies 5 times in as many tries, will stop trying and send email
# alert.
#
# if SimFPS drops to 0 for 2 minutes, restart.
#
# if process CPU usage stays above 300% for 2 minutes, restart.
#
# see ../README for configuration instructions.
#
# Original code by Dave Coyle (http://coyled.com/2010/07/07/monit-and-opensim/)
# ROBUST server
check process ROBUST with pidfile /bin/grid/ROBUST.pid
start program = "/bin/bash -c 'cd /bin;/usr/bin/screen -S Robust -d -m -l mono Robust.exe -inifile=Robust.HG.ini'"
as uid opensim and gid opensim
if cpu usage > 1200% for 4 cycles then restart
if 5 restarts within 5 cycles then timeout
if failed host localhost port 8003
for 4 cycles
then restart
group opensim
check process instance01 with pidfile /bin/grid/instance01/instance01.pid
start program = "/usr/local/bin/opensim start instance01"
as uid opensim and gid opensim
stop program = "/usr/local/bin/opensim stop instance01"
if cpu usage > 1200% for 4 cycles then restart
if 5 restarts within 5 cycles then timeout
if failed url http://localhost:9001/jsonSimStats/
and content != '"SimFPS":0.0,' for 4 cycles
then restart
if failed url http://localhost:9001/jsonSimStats/
and content == '"SimFPS":' for 4 cycles
then restart
group opensim
check process instance02 with pidfile /bin/grid/instance02/instance02.pid
start program = "/usr/local/bin/opensim start instance02"
as uid opensim and gid opensim
stop program = "/usr/local/bin/opensim stop instance02"
if cpu usage > 1200% for 4 cycles then restart
if 5 restarts within 5 cycles then timeout
if failed url http://localhost:9000/jsonSimStats/
and content != '"SimFPS":0.0,' for 4 cycles
then restart
if failed url http://localhost:9000/jsonSimStats/
and content == '"SimFPS":' for 4 cycles
then restart
group opensim
[...]

and so forth for all the instances.

A few explanations. I have added monitoring for the ROBUST server as well, but there is no special script for it (Dave Coyle’s script only addresses OpenSim instances, not ROBUST), so I just start it from a command line. ROBUST doesn’t seem to have many statistics available (unlike the OpenSim instances) but it will reply to HTTP requests on port 8003 (assuming that’s the default port you’re using for ROBUST; check your Robust.ini or Robust.HG.ini file for

[Network]
port = 8003

This should be the right port to check for ROBUST.

Unfortunately, monit doesn’t allow (yet?) to create a single “template” for all the instances, so there is no easier way than copying & pasting the same block over and over again. I hate to do that, but it seems to be the only way — unless someone knows of a better trick. As you can see, the test to see if an OpenSim instance is still alive is to check for the SimFPS — 0.0 means a “dead region”.

The localhost:90XX port is what you’ve configured for each instance (it’s the instance’s http_listener_port). And there are a few extra checks for insanely high loads; you could also check for memory consumption (remember, the more prims on a region, the more memory the instance will require) and send alerts if something seems suspicious: for example, a visitor starting a lot of scripts and prims will require more CPU and memory, so you can get an alert if that happens.

That’s pretty much it. Restart monit with service monit restart and now all instances should appear on the Web page for monit. And now you can not only see some usage statistics but also easily click on them to start/stop them 🙂 Very easy and cool! An extra bonus is that when the server reboots due toa  power failure or so, monit, as it launches, will check to see if OpenSim is up or not, and promptly launch everything.

This is not perfect. Sometimes instances take a long time to launch. When that happens, they might be sending “neighbour woke up” messages, or trying to contact ROBUST, and fail to launch. This will make monit attempt to relaunch them over and over again, which might not always succeed, if they take too long. In some cases I’ve seen some instances going up without any regions in them. By looking at the monit manual, I can see that there are ways to deal with timeouts or instances that take longer to run than usual and delay the testing on further cycles (one cycle is about 2 minutes which should be enough in most cases, but sometimes it’s too short).

Still, it works for me, so the best I can hope for is that this is useful for someone else! Enjoy, and let me know about any suggested improvements/workarounds.

Oh, a final note. I could put this up on some “official” Wiki or so. Unfortunately, I’m very unlucky with dealing with Wiki admins. They tend to kick me out all the time and delete all my content for no rational reason whatsoever. Thus I’m reluctant to spend hours writing information like this just to get it deleted at a whim 🙁 Any OpenSim-related Wiki administrator is more than welcome to copy this article and tweak it to their style and personal taste on their own Wikis, but don’t expect me to go over there and maintain it 🙂

CC BY 4.0 OpenSimulator: Multiple instances and monitoring by Gwyneth Llewelyn is licensed under a Creative Commons Attribution 4.0 International License.

About Gwyneth Llewelyn

I'm just a virtual girl in a virtual world...

  • SignpostMarv Martin

    Brief comment while I’m still reading; My year-old article on deploying OpenSim on Windows Server 2008 covers running multiple instances from one install directory, although I’ll admit it doesn’t exactly jump out at you 😛

  • Arielle Popstar

    I read all that and at the end shed a tear of joy that i run all my sims on Windows 🙂

  • [email protected]:disqus 🙂 I guess that what makes the difference for you is that you have, say, 64 GB of RAM on your 16-core server… if it’s running Windows or not is pretty much irrelevant.

  • … actually, scratch that. Someone on the opensim-users mailing list was just asking if it would be appropriate to run OpenSim with 192 GBytes for 56 regions!

    This article is for someone wishing to run half that amount of regions in as little as… 2 (yes, two) GBytes.

  • Arielle Popstar

    My  16 core 64 GB server is on layaway plan till 2015 and in the meantime i use dual core 2 GB laptop  on 32 bit windows 🙂
    I just found  it interesting that it was suggested multiple instances would conserve memory over a single instance  containing the same number of sims. I did some testing and even a quick glance doesn’t bear that out in Windows at least. As an example i started up several empty 1 sim instances  and found each to require approximately 100 MB of ram whereas an empty 16 sim instance required 123 MB total. Projecting that out it seemingly would require a minimum of 1600 MB of ram if all 16 sims were on their own instance vs the 123 MB when combined into one.This appeared to hold approximately true  for whether i ran on sqlite or mysql, in grid or standalone mode.

    mmm….curious now  how many sims i could put into a single instance before i use  up all the ram 😉

  • I believe that under 0.7.4RC1 it’s necessary to add the RollingFileAppender to the “main” OpenSim.exe.config (and not only to the ones for each instance). When adding that, logs started to rotate…

    The same to Robust.exe.config as well.

  • holy crud! how did you ever figure this out! it sounds like an elegant solution and your explanations are wonderful for each point. thank you for posting about this =)

  • Necessity is the mother of invention 🙂

    Ironically, I have had some “angry” replies on the OpenSim mantis (bug tracking system) because of the way I’m using the RollingFileAppender logging facility. It’s not supposed to work with multiple instances and was never designed to be used in the way I use it. So be forewarned, this functionality might be just a “hack” that happens to work — because it followed logically from what the OpenSim developers have implemented — but which was never intended to be used this way. It still works fine, though 🙂

  • ELQ

    This is a great post, very informative! Using the -logconfig switch would have saved you all that RollingFileAppender stuff, but it’s awesome how you figured out how to make it do what you wanted 😀

  • ELQ

    This is a great post, very informative! Using the -logconfig switch would have saved you all that RollingFileAppender stuff, but it’s awesome how you figured out how to make it do what you wanted 😀

  • ELQ

    This is a great post, very informative! Using the -logconfig switch would have saved you all that RollingFileAppender stuff, but it’s awesome how you figured out how to make it do what you wanted 😀

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • But I’m using that…

  • gimisa

    I know this post is old. But the subject is sitll actual .

    I was running Diva (since 0.6.7) on standalone. I was starting to get slow response and crash on oar loading with 8 regions. Reading your article I started robust, and splitted my simulator the way you suggest. Works GREAT !!!!

    Many, Many tks.
    GiMiSa

  • Jeff Kelley

    ConsolePrompt = “Simulator ${Environment|SIMULATOR} (R) ”
    regionload_regionsdir=”../regionfiles/simul${Environment|SIMULATOR}”
    http_listener_port = 900${Environment|SIMULATOR};

    then

    $ENV{SIMULATOR} = $1
    screen -S mono OpenSim.exe

    Et voila! No more per-instance ini files.

  • Endless Tears

    will ../grid/instanceXX/OpenSim.ini still work in OenSim 8.1.1?

  • Endless Tears

    Exception: System.UnauthorizedAccessException: Access to the path “/bin/grid” is denied.