Wednesday, April 25, 2007

The Ultimate Coolness of Signal Catching

Patience is a virtue, but I seem to be lacking it. I often perform long computer simulations and continue working on my code when the program runs in a different window. The end result of the simulation is some data files containing the final solution.
What a waste of resources, however, when the simulation ends and it turns out to have become unstable in an early stage already, producing senseless data all the time. Or, imagine you really have to leave the office and want to take your laptop home, but the simulation is still only halfway. Pressing Ctrl-C is easy, but you'll have to start the simulation from the start again at home. I would like to save solution data at any intermediate time.
I decided to have a look at the use of UNIX signals to solve these little annoyances.

The idea is neither original, nor very difficult; it's just extremely handy! Most of my simulations consist of a large loop in the main routine that starts at simulation time Tstart and ends at time Tend, taking small time steps. Here's sample terminal output to get an idea:

[..]
#        1967  time =   0.1986020
finished MM after            2  steps (mmres=  9.5645049E-006 ).
#        1968  time =   0.1986853
finished MM after            2  steps (mmres=  9.8183985E-006 ).
[..]

When I press Ctrl-Z ('suspend'), this is what happens:

finished MM after            2  steps (mmres=  9.7805565E-006 ).
#        2594  time =   0.2427970  (Ctrl-Z pressed)
Caught suspend. Saving current data...
Continuing execution...
finished MM after            2  steps (mmres=  9.9850709E-006 ).
#        2595  time =   0.2428664

And when I press Ctrl-C ('interrupt'), this is what happens:

finished MM after            5  steps (mmres=  1.0567982E-005 ).
#        2617  time =   0.2444400 (Ctrl-C pressed)
Caught interrupt. Saving current data...
Exiting.

Note that I deliberately use the SIGINT (Ctrl-C) and SIGTSTP (Ctrl-Z) signals, because those are sent when pressing the stated key combinations. The reserved user signals SIGUSR1 and SIGUSR2 would have to be sent for example by the kill command, which is inconvenient for my use.

Implementation in Fortran

On UNIX systems, signal handling is performed by the functions defined in the system-wide (C-)library signal.h. Fortunately libsigwatch provides some basic but sufficient helper functions that allows us to define which signals we want to catch, and what special action should be taken.

Here is my relevant Fortran (95) source code:

module mmsignal
use mmio

contains

subroutine initsighandles()
  integer  watchsignal
  integer :: status
  status = watchsignal(2) ! SIGINT (Ctrl-C)
  status = watchsignal(20) ! SIGTSTP (Ctrl-Z)
  ! Optionally check for status == -1 to detect failure
end subroutine

subroutine catchsignals()
  integer getlastsignal
  integer :: lastsig

  lastsig = getlastsignal()

  if (lastsig .eq. 2) then
    print *, 'Caught interrupt. Saving current data...'
    call save_state
    print *, 'Exiting.'
    stop
  end if
  if (lastsig .eq. 20) then
    print *, 'Caught suspend. Saving current data...'
    call save_state
    print *, 'Continuing execution...'
  end if
end subroutine

end module

The routine watchsignal(int) installs special signal handling for the requested signal: each time the signal is received, it is stored in a variable accessible by calling the routine getlastsignal(). The custom signal handler itself is not handled by the sigwatch library: I call my own routine catchsignals() within the main loop's body that checks whether the most recent signal was either SIGINT or SIGTSTOP, and if not just does nothing.

To include the sigwatch library in your program, just add it to your linking stage and you are done:

> gfortran -lsigwatch finvol.o mmadapt.o mmbounds.o mmconfig.o mmdata.o mmio.o mmphys.o mmsignal.o mmsolve.o mmusr.o -o mmsolve

(Relevant parts are highlighted.)

Implementation in C(++)

In C++, we can directly use the functions in the (C-) signal library. I used this in a classroom package implementing genetic algorithms. The following code is incomplete, but all relevant parts are included:

#include <iostream>
#include <csignal>

#include "galib.h"

bool stopGiven = false;

void handleUserStop(int sig_num)
{
  stopGiven = true;
  std::cout << "*** User interrupt caught." << std::endl;
  std::cout << "*** Please stand by while current generation is completed and saved." << std::endl;
}

int main(int argc, char **argv)
{
  signal(SIGTSTP, handleUserStop);

  // ...

  while (!stopGiven) {
    p->nextGeneration();
    // ...
  }

  // In case of a user stop signal or when requested, save last generation
  if (stopGiven || parameters.savelastgen) {
    std::cout << "Writing generation #" << i << " to file \"" << LASTGENFILE << "\"." << std::endl;
    p->writePopulation(fpopout);
  }
  fpopout.close();

  delete p;
  return 0;
}

Notice how the custom signal handler now is enabled by the signal library, installed in one step by the call signal(SIGTSTP, handleUserStop);

Now that I have saved intermediate solution data, I can extend the main program to offer a resume functionality, using the data file for initialization. That task is not only easy, but it also does not have anything to do with signals, so I will not cover that here.

I still benefit from the above functionality almost every week, I hope it serves you well too.

4 Comments:

Blogger Unknown said...

This is indeed very cool!

Tijmen.

1:08 PM  
Blogger Unknown said...

(numer of people using this)++

3:07 PM  
Blogger Unknown said...

Even better: in the function void handleUserStop(int sig_num), add the line
signal(sig_num,SIG_DFL);
so that pressing Ctrl-C twice does force the program to quit.

5:01 PM  
Blogger Arthur said...

Useful indeed, TJ!

11:21 PM  

Post a Comment

<< Home