[ start | index | login ]
start > knowledgebase > linux > auto-restart

auto-restart

Created by justin. Last edited by justin, 3 years and 143 days ago. Viewed 5,014 times. #2
[diff] [history] [edit] [rdf]
labels
attachments
Every now and again, a service may go wacky. Perhaps some software that it must work with is not prepared to work with the latest version, and it falls behind, perhaps there are hardware problems which will take some time to track down, or maybe the software itself could stand to be more robust.

In all of these cases, if you rely on an application, and from time to time it either dies completely or simply stops responding to requests, you may want something which runs periodically, causing an automatic restart if a simple check does not succeed. There are very advanced monitoring systems which try to automate this sort of thing, but I prefer to localize a solution so that automated restarts do not require automated remote login.

So, when this happens, and it has happened to me recently with some older Zope 2.7 and 2.8 instances, I have set up auto restarts. In most cases I have upgraded to more recent code, and the problems have been gone for some time, but I have kept the auto-restart around because it is not resource-intensive, and i know that i will have some warning, probably, before any users notice that a problem has resurfaced in our system.

In any case, here's the script, it's pretty straightforward. I have written it to run under daemontools in a manner which I'll explain, but it can be run directly from cron every minute or so. Adjust to taste:

#!/bin/sh

exec 2>&1

echo "--- checking zope.. ---" if ! wget --spider -t 1 -T 5 >>http://cmc.sigchi.org/ ; then if [ -f caught ]; then rm caught; echo "restarting zope.."; echo "restarting zope on turing.acm.org.." | mail -s "FATAL zope restart on turing." admin@cmc.sigchi.org; svc -t /service/zope-turing-orig # if the process is especially stubborn and ignores TERM # svc -k /service/zope-turing-orig # if you don't use daemontools for zope, you may want something like this: # /home/zope/instance/zope-turing-orig/bin/zopectl restart # or, if stubborn # /home/zope/instance/zope-turing-orig/bin/zopectl stop # /home/zope/instance/zope-turing-orig/bin/zopectl start # stop and start is good in case the process simply dies, # as many service "restart" scripts don't handle this well else touch caught; fi else if [ -f caught ]; then rm caught; fi fi

If you aren't familiar with daemontools, you should really check it out, it's great stuff, a very simple tool written by the author of qmail, Dan J. Bernstein. You can read all about it here:

>>http://cr.yp.to/daemontools.html

To make a long story short, daemontools is a simple service monitor, and it has a couple of handy features which I like for this purpose:

  • Automatic log rotation
  • The ability to control a service based on filesystem permissions
So, I can run this script as user / group 'daemon', and simply give this group access to a few control files which will be read by the controlling process. Also, each time the check is run from cron, daemontools logs it in a nice, tidy way which allows me to have amazingly verbose information about the last few thousand minutes of activity, but not to kill my drive. You could use logrotate for this, and you should if you are not using daemontools, but I find djb's tools handy, and on my servers, they are already installed. :)

So, for me, this is the 'run' script in a daemontools service called checkzope. This service directory also contains an empty file named 'down' at its' root, which tells daemontools not to actually start this script as a daemon. My crontab line looks like this:

* * * * * /command/svc -o /service/checkzope

The 'svc' command is the main user interface to daemontools, generally I would pass '-t' for a restart, '-d' to stop, '-k' to send SIGKILL, but in this case, we only want the script to run once, indicated by '-o'.

Keep in mind that auto-restart should not be a permanent solution. I've left mine in place because they don't hurt anything, but I am not relying on this for stability. In fact, I really don't like when the auto-restarts happen, but I did a bit of reading on how Google's infrastructure works, and one thing which was very central was that they assume everything will fail, automated restarts being a tame necessary evil. I look at it as insurance. I hope my application does not ever fail, but if it does, I prefer to receive an e-mail than to be woken up or have services be unavailable.

:)

no comments | post comment
Powered by snipsnap.org Found a mistake in a howto? Let us know via an email to p.blikibugs at rimuhosting com.