OpenSimulator: Multiple instances and monitoring

Recently I was “forced” (in the good sense of the word!) to get back to work on Beta Technologies’ own OpenSimulator mini-grid. It usually remains pretty much empty (and open to attacks from, uh, “visitors”) most of the time, ready to be used when something comes up — like a customer, some work for my PhD, or even some friends needing a temporary setup with a few islands to experiment things, and having no patience or time to set up their own environment. It’s not a “public” OpenSim grid by any means; on the other hand, it’s also not a “public sandbox” but mostly a “storage” of old projects and ongoing ones.

And the server lacks the ability to run everything smoothly. It’s actually an old PC who was used for a while until it became too obsolete for running the Second Life® Viewer. Now it gets an extended life as a Linux server running an OpenSim grid with 22 or so regions, behind an ADSL connection with poor bandwidth. Not exactly the best of circumstances — but it tends to work.

Somewhat.

So, bear with me as I describe a very technical and boring way to deal with all this 🙂 Maybe — just maybe — this might be useful for someone else who is running their own OpenSim grid.

From single-instance to multiple-instance

After the last project we had — during which I don’t do any updates or upgrades or tweaks, or the content creators will flog me within an inch of my life 🙂 — I decided I should step up from OpenSimulator 0.7.2 to 0.7.3.1, the (current) stable branch [at the time of writing this article, that is]. It brings a few improvements regarding meshes — and some configuration changes.

Upgrading OpenSim to a new version is always a small nightmare. You have to remember that the vast majority of OpenSim developers don’t actually use it in a production environment — they just back up some sample sims of different sizes, delete everything, and start from scratch, and then retrieve their backups. When on a production environment, this is not feasible: beyond prims, there is also user data, profiles, groups, and inventory to save. So this means that you’ll have to keep your database running and just upgrade it. Fortunately, since the very beginning of time, OpenSim checks the database and adds any extra features that are required to get it going. Sometimes this can be dramatic — I remember the switch to 0.6.7 to 0.7! — but usually it’s just a few extra columns on some tables, which get updated automatically.

No, the major problem is usually with the configuration files. The defaults change from version to version. So you can’t simply get your “old” configuration files and just drop it in the new environment: names might change, what used to be the default is now turned off, new options have been added, old options are made obsolete. So what one has to do is to open the old configuration files — one by one — and check the new ones manually, line by line, to make sure you have covered everything.

This is tedious work and prone to errors (there is always a file you forgot, or a new set of defaults that will make your new configuration fail utterly), and that’s one of the reasons why the Aurora Sim Team has provided an alternative OpenSim distribution which makes maintenance and configuration way simpler. Unfortunately, Aurora’s database is incompatible with the “core” OpenSim distribution, so I have never “migrated” — and have to patiently deal with OpenSim’s cumbersome configurations.

Cumbersome… yes, but also flexible. You’ll see why in a minute!

After upgrading to 0.7.3.1 everything seemed to work well… but suddenly I noticed that the server load was about 10 times higher than before! Memory seemed to be at a premium, and that pushed me to go to the OpenSimulator Mantis bug tracking website in search for similar complains and possible solutions. Alas, the problem is more complex than I thought. But I had no idea of that back then.

Some people suggested simply to split regions among multiple instances. Up until now, I just had two Mono processes running: one was ROBUST (that’s the central servers, dealing with assets, inventory, logins, and so forth), the other was OpenSimulator itself, hosting all the regions for our grid. My theory was that since the grid is so little used, it would be pointless to split the regions among several OpenSim processes. Well, expert OpenSim administrators told me otherwise. Like Linden Lab does split regions among several (virtual) servers, the same should be done under OpenSim as well — this will conserve memory and make each individual instance more responsive. An extra bonus is that when one region crashes the simulator software, not all regions will be down: only those hosted on that particular instance.

These seemed to be compelling enough arguments. Anything that conserved memory on an overloaded CPU would make sense for me, so I got my favourite text editor (Coda 2 for the Mac; version 2 has just been recently released) and started hacking at configuration files.

And reading installation instructions. I quickly came to the conclusion that OpenSim, by default, doesn’t make it easy to split regions across instances. What the OpenSim experts do is simply copy & paste the whole directory where OpenSim resides several times, and launch OpenSim from each, placing different regions on each directory.

Well, you can immediately see the maintenance problem this has. When you do a fresh install, obviously this is the easiest way to split your grid across several installations. But what about upgrades? Or if you add a new module (say, for economy, or group management…)? You have to go into each directory and configure everything separately. This will spell trouble — sooner or later, the configuration will be “out of sync” because you forgot to tweak one line on a 1,000-line-configuration-file, and suddenly one instance or the other will not work. Worse than that, if you compile OpenSim in different directories, from the perspective of a system administrator, it’s a different application. While many libraries and DLLs are common — and thus shared across instances — operating systems are better sharing code with multiple instances of a single application than across “similar” (but not exactly the same!) applications.

So I looked into this and tried to figure out if anyone managed to run multiple instances from the same directory. Apparently, nobody does — or, if they do, they’re silent about it. I seriously suspect that all commercial OpenSim providers do it, but they tweak the code to their own purposes and don’t talk about it. They also upgrade their own grids very little, because it’s such a mess upgrading everything.

Back to the drawing board, I started to look at how OpenSim loads all its configuration files these days. In fact, the core developers are clever. They start launching one file, OpenSimDefaults.ini. This provides “reasonable” default configurations. That’s why you can start a standalone sim on your own desktop without a configuration file — the defaults are there to let you start running OpenSim immediately. Then you add all your “overrides” on OpenSim.ini. This means that in theory, you should never touch OpenSimDefaults.ini (it gets updated with each successive version anyway) and just make whatever changes you need on OpenSim.ini. Later on, OpenSim will look for extra configuration files for your externally compiled modules and add them as well. And — this is the cool bit! — you can actually tell OpenSim where to find the configuration files with a command-line option.

Thanks to this, my problem now seemed to be simple: I would just need to have a different OpenSim.ini for each instance, and load them appropriately from the command line.

Well, not really. OpenSim.ini is a huge file (even though it got shortened since the developers rely on the reasonable defaults set on OpenSimDefaults.ini). The problem in keeping separate copies of huge files is that they tend to quickly get out of sync, too. And in truth, I just need to change very little for each instance. In fact, I just need to change two things:

  1. The port it listens to for external TCP connections (http_listener_port)
  2. The location for the region files for this particular instance

It seems to be a waste to have several duplicates of a 920+-line file around for just those two changes!

Now here comes the cool bit. There is actually a way to have OpenSim load OpenSimDefaults.ini, OpenSim.ini, and extra .ini files! This is documented on the page for configuration options.

So here is what I’ve done. I’ve created a new directory under .../<path to your opensim install>/bin, where OpenSim.exe and OpenSim.ini are, and called it grid. Under that directory, I created new directories for each instance. Let’s call them instance01, instance02, instance03 and so forth (I personally prefer to give them names related to their Estates).

Then each of those instanceXX folders gets its own OpenSim.ini file, and a new folder called Regions. You’ll see why.

The ../grid/instanceXX/OpenSim.ini file is very simple, it just needs the following lines:

[Startup]
regionload_regionsdir = "/<path to your opensim install>/bin/grid/instanceXX/Regions/"

[Network]

http_listener_port = 90XXCode language: HTML, XML (xml)

Note that the http_listener_port has to be unique across instances. It’s unrelated to the individual regions’ port: this one is TCP (for HTTP requests), while the regions use UDP to communicate among each other. If your grid is open to the public, don’t forget to add the extra ports on your firewall (I forgot and was surprised that “nothing worked!”)

Now it’s time to split your regions among the many instances. Just copy & paste the region information from Regions.ini into each /<path to your opensim install>/bin/grid/instanceXX/Regions/Regions.ini file. (If you’re like me, and were still using the old XML region configuration files instead of the new Regions.ini, it’s simple: just start with an empty Regions.ini and launch OpenSim as described below. OpenSim will ask for the first “new” region — just feed it the UUID you already have from the XML configuration file, and it’ll retrieve the assets for it and write a new Regions.ini file for you. Then you have to add further regions using the create region command. Again, the trick is to feed it existing region UUIDs and copying the relevant data from the XML files — e.g. name, location, etc. It’s a bit tedious, but it will work, and you won’t lose any data.)

At the end, you’ll have something like this:

<path to your opensim install>
|
+-- bin
    |
    +-- OpenSimDefaults.ini (don't change this one — ever!)
    +-- OpenSim.ini ("master" configuration file common to all instances)
    +-- grid
        |
        +-- instance01
        |   |
        |   +-- OpenSim.ini
        |   +-- Regions
        |       |
        |       +-- Regions.ini
        |
        +-- instance02
        |   |
        |   +-- OpenSim.ini
        |   +-- Regions
        |       |
        |       +-- Regions.ini
        |
       ...

etc. and so forth until the last instance.

Now the command line magic. Under Unix (Linux/Mac) you can start the console with

cd <path to your opensim install>/bin
screen -S instanceXX-d -m -l mono OpenSim.exe -hypergrid=true -inidirectory="./grid/instanceXX"

… if you’re running a hypergrided grid (it’s my case); if not, leave -hypergrid=true out.

You can check if everything is working as you call each instance separately because they will say at the top of the console (or on the log):

2012-05-30 17:51:41,390 INFO - OpenSim.ConfigurationLoader Searching folder ././grid/instanceXX for config ini files
2012-05-30 17:51:41,413 INFO - OpenSim.ConfigurationLoader [CONFIG]: Reading configuration settings
2012-05-30 17:51:41,414 INFO - OpenSim.ConfigurationLoader [CONFIG]: Reading configuration file <path to your opensim install>/bin/OpenSimDefaults.ini
2012-05-30 17:51:41,529 INFO - OpenSim.ConfigurationLoader [CONFIG]: Reading configuration file <path to your opensim install>/bin/OpenSim.ini
2012-05-30 17:51:41,597 INFO - OpenSim.ConfigurationLoader [CONFIG]: Reading configuration file <path to your opensim install>/bin/grid/instanceXX/OpenSim.iniCode language: HTML, XML (xml)

As you see, this is the expected order we wish OpenSim to load our configuration files. First, the overall defaults for this version. Then our common, “master” configurations. And finally, the changes for each instance.

Simple!

Now if the OpenSim developers launch a new version, all you need to do is to copy the grid configuration over and launch all the instances with the new version. Assuming the core developers don’t change the command-line settings, this configuration should work… forever. Heh!

Splitting and rotating logs

All right, so the above solution is nice, but not perfect. Suppose that you wish to get separate logs for each instance, instead of throwing them all inside the single OpenSim.log log file. There are some good reasons for that: on underpowered servers (somehow, in my two decades of Unix administration, I only get hold of old, underpowered servers, who knows why…), having lots of processes writing to the same file is not an insanely good idea, since each process will have to lock the file for writing, and release it to the other processes when finished — forcing them all to wait until that happens. Oh, sure, I know I’m talking about nanoseconds here. But the slower the server, the more those nanoseconds add up. This is the reason why modern multitasking operating systems use the syslog facility to write to log files that are used by a lot of processes… but I digress. The point is, Mono, true to its Windows heritage (gasp!), does everything differently. It has its own library to deal with log writing. We’ll have to tackle that instead.

Also, the more this file grows, the more effort the kernel will have in appending lines to it. Again, on superfast computers with lightning-speed RAID arrays, you probably won’t see any difference writing to a 100-byte file or a 100-TByte file. But on old, underpowered servers, this will make a difference. Again, modern Unixes have another system tool to deal with that — Linux mostly uses logrotate, FreeBSD and macOS use newsyslog (there might be a few more). The principle is the same: you kill the application writing to the logfile, archive the current log file (say, appending a .1 to it, or even compressing it to save disk space), create a brand new log file, and re-launch the application. This can easily be accomplished by working together with cron. Those services then have further options to say how often the check will be performed, if it should happen every day or only when the file hits a certain size limit, and so forth. In most cases, this is acceptable behaviour for stateless applications.

What are stateless applications? Well, pretty much everything that doesn’t require knowing “what happens to my data when I kill the application?” Web servers are a typical example: you request an URL and it’s an atomic operation — you get a result. The next request may come from the same browser or one that is completely different, but the webserver doesn’t care. It handles each case atomically and independently of the other (that’s why we need to save session data in cookies or on backend servers to make a web-based application persistent). Mail servers work similarly. In fact, most Internet-based services actually work that way. The ones offering persistent services that are not stateless usually store things on a database server. Of course, the database server itself should not be shut down just to rotate a logfile — and neither should OpenSim!

Enter the fantastic world of log4net. This is a framework created by the Apache foundation (aye, the same guys that give us that fantastic web server which powers something like 35% of the Web or so) which deals with configuring how the logs should be written out to disk (or other facilities) on Mono/.NET applications. Because the OpenSim core developers included log4net on OpenSim, you can tweak it in order to get full log rotation for the multiple instances, without the need of relying on external tools like logrotate or newsyslog — and, most importantly, you won’t need to restart OpenSim every time the log reaches a certain limit.

I’m giving here an example on how to tweak OpenSim to write a new log as soon as the current one hits 1 MByte, and keep the last two logs as backups. You can accomplish way more complex things with log4net. Fortunately, the good OpenSim core developers have provided detailed instructions on how to do this.

Basically, all is done through a little-known configuration file called OpenSim.exe.config. This holds mostly the configuration (in XML) for logging. So what you need to do is to copy this file over and over again to each <path to your opensim install>/bin/grid/instanceXX/ directory. Then you have to edit each one to reflect your current configuration.

About line 21 (on OpenSim 0.7.3.1) you will have something like:

<appender name="LogFileAppender" type="log4net.Appender.FileAppender">
     <file value="OpenSim.log" />
     <appendToFile value="true" />
     <layout type="log4net.Layout.PatternLayout">
       <conversionPattern value="%date %-5level - %logger %message%newline" />
     </layout>
   </appender>Code language: HTML, XML (xml)

This is the standard behaviour: append new log lines to the file OpenSim.log and let it grow “forever”. We want to rotate logs instead, so we’ll use a log4net module called RollingFileAppender instead of the LogFileAppender used as default. So edit this section and replace it with something like this:

<appender name="RollingFileAppender" type="log4net.Appender.RollingFileAppender">
  <file value="OpenSim.instanceXX.log" />
  <appendToFile value="true" />
  <maximumFileSize value="1000KB" />
  <maxSizeRollBackups value="2" />
  <layout type="log4net.Layout.PatternLayout">
    <conversionPattern value="%date %-5level - %logger %message%newline" />
  </layout>
</appender>Code language: HTML, XML (xml)

Note that I’ve changed the configuration shown on the OpenSim Wiki to place the logs for each instance on a separate file (OpenSim.instanceXX.log).

A bit below on the very same OpenSim.exe.config, under the <root> section, change

<appender-ref ref="LogFileAppender" />

to

<appender-ref ref="RollingFileAppender" />

So now your directory layout should look something like this:

<path to your opensim install>
|
+-- bin
    |
    +-- OpenSimDefaults.ini (don't change this one — ever!)
    +-- OpenSim.ini ("master" configuration file common to all instances)
    +-- grid
        |
        +-- instance01
        |   |
        |   +-- OpenSim.ini
        |   +-- OpenSim.exe.config
        |   +-- Regions
        |       |
        |       +-- Regions.ini
        |
        +-- instance02
        |   |
        |   +-- OpenSim.ini
        |   +-- OpenSim.exe.config
        |   +-- Regions
        |       |
        |       +-- Regions.ini
        |
       ...

That’s it! Relaunch your instances, and now you should get an extra line on the logs at the very top (after the launch) showing:

2012-05-29 09:37:39,969 INFO  - OpenSim.Application [OPENSIM MAIN]: configured log4net using "./grid/instanceXX/OpenSim.exe.config" as configuration fileCode language: JavaScript (javascript)

This should show that everything has been loaded nicely 🙂

Starting up, shutting down and monitoring your instances

Ok, let’s face it: system administrators are lazy. The last thing you want to do is to spend your time watching logs and consoles, see if a sim crashes (or a whole instance!) and restart it from scratch. In my personal case, while struggling with a huge increase in memory consumption and CPU load when switching from 0.7.2 to 0.7.3.1, I really also wanted to get a way to get alerts if something is seriously wrong with the grid, and relaunch OpenSim when it crashes and so forth.

There are a gazillion Unix utilities for doing all that and much more, many of which commercial, and a trillion more that are free and open source. I’ll stick with a relatively simple one, which is called monit.

monit is a relatively simple, rule-based system. You define a few rules of what should be checked and what happens when the check fails. At its simplest level, it can ping a server for you, and tell you if it’s responding. If not, it emits an alert. You can define what happens with the alert — log it to a file, send you an email, and so forth. You can also define what happens with multiple alerts — e.g., shut down the machine if nothing replies any longer, for example (a bit drastic, but possible!). monit also works at the application level: so you can do an HTTP call to a webserver and not only ping it, but also see if it replies to a request as you expect (say, check for a web page that should be there, and send you an alert if it isn’t). Since it can pretty much call everything, and not merely “send alerts”, you can get it to do more complex things: for example, test if the database server is replying to queries inside a predefined time frame, and, if not, attempt to repair/optimise tables; if that fails, restart the database server.

There also are a lot of “predefined” tests that you can do, like checking for memory consumption per monitored process, CPU consumption, or overall CPU load. All that gets on a nice control Web page (password-protected), where you can also manually stop/start services. And if you’re monitoring a whole array of servers — a huge grid, with multiple servers and multiple instances per servers — you can use M/Monit, a “master” monit that monitors all other monits in a network from a single page. Very cool and simple to use.

Before you start commenting that package X does all what monit does and much more… rest assured, I’m aware of many of those packages. My choice of monit is not because it’s the “best”, but because the rules are very easy to write, and the interface is designed for stupid and lazy people like me 🙂

Apparently commercial OpenSim developers also use monit, so I took a peek at some configuration suggestions that they’ve made. The principle is simple:

  1. First you need to let monit know the Process ID (PID) of each instance, so that it can restart it if needed.
  2. Then you have to turn on some very simple statistics on each instance (OpenSim has lots of ways to give remote access to statistics) so that monit can check if the instance is “alive”. Dave Coyle suggests turning on JSON-based statistics (they are very compact and presented very quickly) and allow monit to check for SimFPS. If SimFPS is at zero, it means that this instance is probably dead.

We’ll use Dave Coyle’s suggested monit for opensim package but add a few tweaks to deal well with multiple instances of OpenSim.

The first thing is to go back to each OpenSim.ini configuration file for the instances. We’ll need to add two new lines. The first will save the instance’s PID to a file (which we’ll feed to monit later). The second turns on JSON-based statistics. So now your OpenSim.ini will look like:

[Startup]
    PIDFile = "/<path to your opensim install>/bin/grid/instanceXX/instanceXX.pid"
    regionload_regionsdir = "/<path to your opensim install>/bin/grid/instanceXX/Regions/"
    Stats_URI = "jsonSimStats"

[Network]
    http_listener_port = 90XXCode language: JavaScript (javascript)

Not too bad. Notice that traditionally, PID files are written under /var/run/<application name>, at least on Debian-inspired Linuxes and BSD-inspired ones (like macOS). I had a problem with Ubuntu: since OpenSim is running under its own user (opensim in my case), this directory has no permissions to write. Of course, I can add the directory and change ownership. The trouble is that Ubuntu, on a system restart, will delete /var/run (makes sense, since no applications are running when it boots 🙂 ). This can be changed somewhere, somehow, but I was lazy again, and just kept the PID files under a directory where I’m sure that OpenSim will be able to find them and write to them.

Now we come to monit. You might not be too surprised that these days, like most applications (<wink wink>), monit has a “master” configuration file — /etc/monit/monitrc — where reasonable defaults are set, and then you can add further configuration files for each application under /etc/monit/conf.d. This is what I’ve used.

First, let’s start with /etc/monit/monitrc.

I want to be able to administer monit via a Web interface (cooler to do!) so I have uncommented the following bits:

set httpd port 2812 and
    allow admin:Code language: JavaScript (javascript)

Then, since my server has been misbehaving lately (I’m still discussing on OpenSimulator’s Mantis what is causing the high memory consumption and high CPU usage), I’ve also made sure I get some alerts for high resource consumption:

  check system rivendell
    if loadavg (1min) > 6 then alert
    if loadavg (5min) > 4 then alert
    if memory usage > 95% then alert
    if swap usage > 25% then alert

You should adjust that for your own server. Load average over 4 or 6 is really way too high!

And finally, I also wanted email to be sent every time an “alert” is triggered. To do that, just uncomment the line which says:

  set alert [email protected]

As a side-note, if you’re running your server behind a NAT firewall, you have two choices. The first is to send all emails through your corporate/campus email server. That’s probably best and easiest. You can see how it gets configured using this example — in my case, I’m using a configuration to send mail via Gmail’s own servers:

  set mailserver smtp.gmail.com port 587
      username <your Gmail username>@gmail.com
      password "<your Gmail password>"
      using TLSV1
  set mail-format { from: <your Gmail username>@gmail.com }Code language: CSS (css)

This works like a charm. I actually don’t use Gmail because I want to use the same account for monit and inside OpenSim itself (which allegedly doesn’t support TLS or SSL authentication, which are mandatory for Gmail). Also, I’m paranoid about leaving sensitive password (like my own Gmail password!) around in plain view on a configuration file 🙂 Of course, you can simply register a new Gmail account just for monit.

If you’re very brave, you can run your own local mail server. On Ubuntu, the recommended mail transfer agent is Postfix. However, I think it’s not worth the time and patience to run a mail server, deal with spam and inter-server authentication and so forth. Gone are the days where every Linux box on the planet ran their own mail servers 🙂

If you’re running MySQL and possibly Apache (for extra grid services like groups, profiles, IMs between grids etc.) on the same box, you might wish to add the following files under /etc/monit/conf.d:

/etc/monit/conf.d/apache.conf:

# Monitoring the apache2 web services.
# It will check process apache2 with given pid file.
# If process name or pidfile path is wrong then monit will
# give the error of failed. tough apache2 is running.
check process apache2 with pidfile /var/run/apache2.pid
#Below is actions taken by monit when service got stuck.
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
# Admin will notify by mail if below of the condition satisfied.
#if cpu is greater than 60% for 2 cycles then alert
#if cpu > 80% for 5 cycles then restart
if totalmem > 400.0 MB for 5 cycles then restart
if children > 250 then restart
#if loadavg(5min) greater than 10 for 8 cycles then stop
if failed host 127.0.0.1 port 80 protocol HTTP then restart
if 3 restarts within 5 cycles then timeout
group serverCode language: PHP (php)

These are pretty much standard tests for Apache, which I’ve tweaked slightly to better reflect my own server’s setup.

/etc/monit/conf.d/mysql.conf:

check process mysql with pidfile /var/run/mysqld/mysqld.pid
group database
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
if failed host 127.0.0.1 port 3306 then restart
if 5 restarts within 5 cycles then timeoutCode language: JavaScript (javascript)

All right. Do a service monit start and you should be able to log in to http://<your internal ip address>:2812/ and see your monit already doing something 🙂

Now it’s time to tackle OpenSim. This is not going to be easy, so bear with me for a moment.

My source of inspiration, again, was Dave Coyle. Dave actually uses an external script to launch OpenSim (and stop it), so that monit just calls that script on start program and stop program. All very nice but it was designed to work for single-instance configurations — or, if you wish, you could just make copies of that bash script for each instance.

As said, I dislike the idea of having lots of junk around with “similar” configurations, so I’ve tweaked Dave’s script to deal with all instances from the same location and take into account my “special” configuration (i.e. instances launching their specific .ini files and region files under ./grid/). Also, I rather fancy using screen to be able to log in directly to each console. So, in my proposed configuration setup, each instance gets a screen console named after the instance itself.

Here goes Dave’s script:

#!/bin/bash
#
# for usage, run without arguments
#
# see ../README for setup instructions.
#

# Original code by Dave Coyle (http://coyled.com)
# Tweaks by Gwyneth Llewelyn (http://gwynethllewelyn.net/)

# Requires bash 4

# The original script assumed that you'd be running one sim per instance,
#  and launch a different script per sim.
# These changes assume you have multiple instances with multiple sims,
#  and that instance names are launched all from the same place with
#  an unique identification for each

# List of valid instances. Make sure you add all of your instances here
declare -A instances
for index in instance01 instance02 instance03 instance04 instanceXX
do
        instances[$index]=1
done

show_help() {
    echo "usage: ./opensim {start|stop|restart|console} "
    echo -n "    where  is one of: "
    echo ${!instances[*]}
}

check_user() {
    if [ $USER != 'opensim' ]; then
        echo "This must be run as user opensim"
        exit 1
    fi
}

setup() {
    if [ ! $1 ]; then   
        show_help
        exit 1
    else
        SIM=$1
    fi

    if [[ ${instances[$SIM]} ]]; then
        MONO="/usr/bin/mono"
        OPENSIM_DIR="<your opensim directory>"
        PID="$OPENSIM_DIR/bin/grid/${SIM}/${SIM}.pid"
        SCREEN="/usr/bin/screen"
        GRID_DIR="./grid"
        # set GRID_DIR to the subdirectory where your individual
        #  instance configuration is
    else
        echo "Sorry, I've never heard of sim ${SIM}.  Exiting."
        exit 1;  
    fi
}

do_start() {
    if [ ! $1 ]; then
        show_help
        exit 1
    else
        SIM=$1
    fi

    setup $SIM
    check_user

    cd ${OPENSIM_DIR}/bin && $SCREEN -S $SIM -d -m -l $MONO OpenSim.exe \
      -hypergid=true -inidirectory="$GRID_DIR/$SIM" \
      -logconfig="$GRID_DIR/$SIM/OpenSim.exe.config"
}

do_kill() {
    if [ ! $1 ]; then
        show_help
        exit 1
    else
        SIM=$1
    fi

    setup $SIM
    check_user

    if [ -f $PID ]; then
        kill -9 `cat $PID`
    else
        echo "Sorry, ${SIM} PID not found."
        exit 1
    fi
}

do_console() {
    if [ ! $1 ]; then
        show_help
        exit 1
    fi  

    setup $1

    cd ${OPENSIM_DIR}/bin && $SCREEN -S $SIM -d -m -l $MONO OpenSim.exe \
      -hypergrid=true -inidirectory="$GRID_DIR/$SIM" \
      -logconfig="$GRID_DIR/$SIM/OpenSim.exe.config"

}

case "$1" in
    start)
        do_start $2
        ;; 
    stop)
        do_kill $2
        ;;
    kill)
        do_kill $2
        ;;
    restart)
        do_kill $2
        do_start $2
        ;;
    console)
        do_console $2
        ;;
    *)
        show_help
        exit 1
        ;;
esacCode language: PHP (php)

Make sure you make this script world-executable (e.g. chmod a+x /usr/local/bin/opensim). Now switch to the user that you’re running OpenSim under (in my case it’s opensim; if not, you’ll have to change this on the script as well) and try out:

/usr/local/bin/opensim start instance01

If all went well, screen -ls should now show you instance01 available and allow you to connect to it.

This is the time to test starting and stopping instances to make sure you don’t have any errors!

Now let’s automate it. Go back to /etc/monit/conf.d and create a new file, opensim.conf, inside it:

/etc/monit/conf.d/opensim.conf:

# manage the OpenSim process for Your Sim
#
# usage:
#     monit start your_sim
#     monit stop your_sim
#     monit restart your_sim
#
# see 'daemon' setting in /etc/monit/monitrc for the cycle length.
# on ubuntu/debian, this is overridden by the CHECK_INTERVALS var in
# /etc/default/monit .  the below assumes you've set it to 30 seconds.
#
# if process dies, will restart sim within 30 seconds.  if process
# dies 5 times in as many tries, will stop trying and send email
# alert.
#
# if SimFPS drops to 0 for 2 minutes, restart.
#
# if process CPU usage stays above 300% for 2 minutes, restart.
#
# see ../README for configuration instructions.
#

# Original code by Dave Coyle (http://coyled.com/2010/07/07/monit-and-opensim/)

# ROBUST server

check process ROBUST with pidfile <your opensim directory>/bin/grid/ROBUST.pid
    start program = "/bin/bash -c 'cd <your opensim directory>/bin;/usr/bin/screen -S Robust -d -m -l mono Robust.exe -inifile=Robust.HG.ini'"
        as uid opensim and gid opensim
    if cpu usage > 1200% for 4 cycles then restart
    if 5 restarts within 5 cycles then timeout
    if failed host localhost port 8003
        for 4 cycles
        then restart
    group opensim

check process instance01 with pidfile <your opensim directory>/bin/grid/instance01/instance01.pid
    start program = "/usr/local/bin/opensim start instance01"
        as uid opensim and gid opensim
    stop program = "/usr/local/bin/opensim stop instance01"
    if cpu usage > 1200% for 4 cycles then restart
    if 5 restarts within 5 cycles then timeout
    if failed url http://localhost:9001/jsonSimStats/
        and content != '"SimFPS":0.0,' for 4 cycles
        then restart
    if failed url http://localhost:9001/jsonSimStats/
        and content == '"SimFPS":' for 4 cycles
        then restart
    group opensim

check process instance02 with pidfile <your opensim directory>/bin/grid/instance02/instance02.pid
    start program = "/usr/local/bin/opensim start instance02"
        as uid opensim and gid opensim
    stop program = "/usr/local/bin/opensim stop instance02"
    if cpu usage > 1200% for 4 cycles then restart
    if 5 restarts within 5 cycles then timeout
    if failed url http://localhost:9000/jsonSimStats/
        and content != '"SimFPS":0.0,' for 4 cycles
        then restart
    if failed url http://localhost:9000/jsonSimStats/
        and content == '"SimFPS":' for 4 cycles
        then restart
    group opensim

[...]Code language: PHP (php)

and so forth for all the instances.

A few explanations. I have added monitoring for the ROBUST server as well, but there is no special script for it (Dave Coyle’s script only addresses OpenSim instances, not ROBUST), so I just start it from a command line. ROBUST doesn’t seem to have many statistics available (unlike the OpenSim instances) but it will reply to HTTP requests on port 8003 (assuming that’s the default port you’re using for ROBUST; check your Robust.ini or Robust.HG.ini file for

[Network]
port = 8003

This should be the right port to check for ROBUST.

Unfortunately, monit doesn’t allow (yet?) to create a single “template” for all the instances, so there is no easier way than copying & pasting the same block over and over again. I hate to do that, but it seems to be the only way — unless someone knows of a better trick. As you can see, the test to see if an OpenSim instance is still alive is to check for the SimFPS — 0.0 means a “dead region”.

The localhost:90XX port is what you’ve configured for each instance (it’s the instance’s http_listener_port). And there are a few extra checks for insanely high loads; you could also check for memory consumption (remember, the more prims on a region, the more memory the instance will require) and send alerts if something seems suspicious: for example, a visitor starting a lot of scripts and prims will require more CPU and memory, so you can get an alert if that happens.

That’s pretty much it. Restart monit with service monit restart and now all instances should appear on the Web page for monit. And now you can not only see some usage statistics but also easily click on them to start/stop them 🙂 Very easy and cool! An extra bonus is that when the server reboots due to a power failure or so, monit, as it launches, will check to see if OpenSim is up or not, and promptly launch everything.

This is not perfect. Sometimes instances take a long time to launch. When that happens, they might be sending “neighbour woke up” messages, or trying to contact ROBUST, and fail to launch. This will make monit attempt to relaunch them over and over again, which might not always succeed if they take too long. In some cases, I’ve seen some instances going up without any regions in them. By looking at the monit manual, I can see that there are ways to deal with timeouts or instances that take longer to run than usual and delay the testing on further cycles (one cycle is about 2 minutes which should be enough in most cases, but sometimes it’s too short).

Still, it works for me, so the best I can hope for is that this is useful for someone else! Enjoy, and let me know about any suggested improvements/workarounds.

Oh, a final note. I could put this up on some “official” Wiki or so. Unfortunately, I’m very unlucky with dealing with Wiki admins. They tend to kick me out all the time and delete all my content for no rational reason whatsoever. Thus I’m reluctant to spend hours writing information like this just to get it deleted at a whim 🙁 Any OpenSim-related Wiki administrator is more than welcome to copy this article and tweak it to their style and personal taste on their own Wikis, but don’t expect me to go over there and maintain it 🙂

Slightly edited & revised in 2020/05/15 because apparently this is getting linked from a post on Starflower Bracken’s website, which may attract some visitors in search of more useful information… (thanks for the link, Starflower! 🙂 )