Stanford Digital Library Testbed Development


Department of Computer Science
Stanford University
Stanford, CA
InfoBus Maintenance Instructions

Content:

Factory Maintenance

The factory is a process running as testbed that monitors proxies on the InfoBus. It helps keep the system up and running.

The Basics

The factory starts and maintains a set of DL services. It periodically pings the services, restarting any that have died. The factory itself is supposed to be kept up by a cron job, so in general even system rebootings shouldn't keep the DL project down.

We actually run two factories, one on Coke and one on Grunion. The one on Coke runs the DL services, the one on Grunion runs the DLITE backend (i.e. starting pods etc). The factories keep a list of active services, named .Stanford.EDU_active.list. You shouldn't modify this file directly. The factories also output error logs for everything they maintain. These are kept in whatever directory the factory is started in -- dldev/src/SDLFactory for the pods and dldev/src/SDLFactory/dl/sunos5 for the services (note that any additional logs, shelves, etc will be kept in this directory as well). These directories are important -- 99% of the time, if you're manipulating a factory, you want to be calling programs from that directory, because that's where the factory's active list resides.

If a service fails to start (i.e. raises an exception starting), the factory will remove it from its active list. To get it back in, first fix the bugs, then call registerservices.py and startall (see below). That will add the service back to the active list.

Another thing to note is that the factory will occasionally start multiple copies of the same thing. This has been noted for the HomeProvider, the summarizer, Interbib, and probably a couple other things too. I have no idea why it gets multiple copies of these things in its active list, but killing them won't help. You pretty much have to do a clean restart (see below) to pare it back down to one copy per program.

Starting The Factory

Recall that the factory gets started through a cron tab. If you want to start it by hand, you should run both of these as testbed:

on Coke (DL InfoBus proxies):

        cd ~/dldev/src/SDLFactory/dl/sunos5
        SDLFactory -daemon >& ~/Logs/factory.coke.log &

on Grunion (DLITE):

        cd ~/dldev/src/SDLFactory
        SDLFactory -daemon >& ~/Logs/factory.grunion.log &
Note that this will start up all services that it was maintaining through the active log. To start "fresh", see Restarting Clean below.

Adding A Service

To add a service to the factory on Coke, you need to edit two files, both in the SDLFactory directory: RegisterServices.py, and startall.

RegisterServices.py

RegisterServices.py registers the services that are supposed to be run with the factory (this is how the active list can be modified). Near the bottom of the file, you'll see a bunch of commands like:

        Register(f, , , , , ???)
You need to add one of these for each service you add. I'm not sure what all these parameters do, but this is what I can make of it: In general, just copy a similar service's entry. Attribute models and search services are very similar. Interbib, SQueryTranslator, and MetaWeb as examples to register services that publish mutliple items in a single server.

Startall

startall actually starts all the services. Services may be registered but not started, so if you want your service to be run by the factory, add the field you added to registerservices.py to this list. Now you need to kick the factory. To do this, make sure you're in the right directory (dldev/src/SDLFactory/dl/sunos for Coke) and call registerservices.py and startall. If this doesn't work, restart everything (see Restarting Clean below).

Monitoring Services

Call
      ps -auwx | grep python
to see what DL services are currently running. If a service has crashed, or is acting strangely, check its logs -- for Coke services, the logs are in dldev/src/SDLFactory/dl/sunos5. If a pod acts up, look in dldev/src/SDLFactory for its logs.

Restarting Individual Services

This is for Coke: If a service dies, the factory will generally restart it. However, if you're impatient, or the factory seems to be slacking, try the following, in order of severity:

Restarting Clean

This is for Coke:

In addition to RegisterServices.py and startall, there is also a script called killservices which kills all processes being run by the factory. It takes one parameter, the active list. For example:

    ./killservices Coke.Stanford.EDU_active.list
So, to do a "full restart" of the factory, what you want to do is kill all current services, kill the factory, remove the active list, restart the factory, re-register the services, and start them. So the sequence is (from SDLFactory/dl/sunos5):
    ../../killservices Coke.Stanford.EDU_active.list
    rm *.shelve *.list
    kill 
    SDLFactory -daemon >& ~/Logs/factory.coke.log &
    ../../registerservices.py
    ../../startall
Note that the cron job will restart the factory at some point if you don't beat it to it.

Adding A New Model or Subcollection

These instructions assume that the new model was created using Yuhua's attribute model editor.
[Testbed] [Stanford] [DigLib] [Write
Webmaster]
Digital Libraries Webmaster
Webmaster@diglib.stanford.edu