Patience is a virtue, but I seem to be lacking it. I often perform long computer simulations and continue working on my code when the program runs in a different window. The end result of the simulation is some data files containing the final solution.
What a waste of resources, however, when the simulation ends and it turns out to have become unstable in an early stage already, producing senseless data all the time. Or, imagine you really have to leave the office and want to take your laptop home, but the simulation is still only halfway. Pressing Ctrl-C is easy, but you'll have to start the simulation from the start again at home. I would like to save solution data at any intermediate time.
I decided to have a look at the use of UNIX signals to solve these little annoyances.
The idea is neither original, nor very difficult; it's just extremely handy! Most of my simulations consist of a large loop in the main routine that starts at simulation time Tstart and ends at time Tend, taking small time steps. Here's sample terminal output to get an idea:
[..]
# 1967 time = 0.1986020
finished MM after 2 steps (mmres= 9.5645049E-006 ).
# 1968 time = 0.1986853
finished MM after 2 steps (mmres= 9.8183985E-006 ).
[..]
When I press Ctrl-Z ('suspend'), this is what happens:
finished MM after 2 steps (mmres= 9.7805565E-006 ).
# 2594 time = 0.2427970 (Ctrl-Z pressed)
Caught suspend. Saving current data...
Continuing execution...
finished MM after 2 steps (mmres= 9.9850709E-006 ).
# 2595 time = 0.2428664
And when I press Ctrl-C ('interrupt'), this is what happens:
finished MM after 5 steps (mmres= 1.0567982E-005 ).
# 2617 time = 0.2444400 (Ctrl-C pressed)
Caught interrupt. Saving current data...
Exiting.
Note that I deliberately use the SIGINT (Ctrl-C) and SIGTSTP (Ctrl-Z) signals, because those are sent when pressing the stated key combinations. The reserved user signals SIGUSR1 and SIGUSR2 would have to be sent for example by the kill
command, which is inconvenient for my use.
Implementation in Fortran
On UNIX systems, signal handling is performed by the functions defined in the system-wide (C-)library signal.h
. Fortunately libsigwatch provides some basic but sufficient helper functions that allows us to define which signals we want to catch, and what special action should be taken.
Here is my relevant Fortran (95) source code:
module mmsignal
use mmio
contains
subroutine initsighandles()
integer watchsignal
integer :: status
status = watchsignal(2) ! SIGINT (Ctrl-C)
status = watchsignal(20) ! SIGTSTP (Ctrl-Z)
! Optionally check for status == -1 to detect failure
end subroutine
subroutine catchsignals()
integer getlastsignal
integer :: lastsig
lastsig = getlastsignal()
if (lastsig .eq. 2) then
print *, 'Caught interrupt. Saving current data...'
call save_state
print *, 'Exiting.'
stop
end if
if (lastsig .eq. 20) then
print *, 'Caught suspend. Saving current data...'
call save_state
print *, 'Continuing execution...'
end if
end subroutine
end module
The routine watchsignal(int)
installs special signal handling for the requested signal: each time the signal is received, it is stored in a variable accessible by calling the routine getlastsignal()
. The custom signal handler itself is not handled by the sigwatch library: I call my own routine catchsignals()
within the main loop's body that checks whether the most recent signal was either SIGINT or SIGTSTOP, and if not just does nothing.
To include the sigwatch library in your program, just add it to your linking stage and you are done:
> gfortran -lsigwatch finvol.o mmadapt.o mmbounds.o mmconfig.o mmdata.o mmio.o mmphys.o mmsignal.o mmsolve.o mmusr.o -o mmsolve
(Relevant parts are highlighted.)
Implementation in C(++)
In C++, we can directly use the functions in the (C-) signal library. I used this in a classroom package implementing genetic algorithms. The following code is incomplete, but all relevant parts are included:
#include <iostream>
#include <csignal>
#include "galib.h"
bool stopGiven = false;
void handleUserStop(int sig_num)
{
stopGiven = true;
std::cout << "*** User interrupt caught." << std::endl;
std::cout << "*** Please stand by while current generation is completed and saved." << std::endl;
}
int main(int argc, char **argv)
{
signal(SIGTSTP, handleUserStop);
// ...
while (!stopGiven) {
p->nextGeneration();
// ...
}
// In case of a user stop signal or when requested, save last generation
if (stopGiven || parameters.savelastgen) {
std::cout << "Writing generation #" << i << " to file \"" << LASTGENFILE << "\"." << std::endl;
p->writePopulation(fpopout);
}
fpopout.close();
delete p;
return 0;
}
Notice how the custom signal handler now is enabled by the signal library, installed in one step by the call signal(SIGTSTP, handleUserStop);
Now that I have saved intermediate solution data, I can extend the main program to offer a resume functionality, using the data file for initialization. That task is not only easy, but it also does not have anything to do with signals, so I will not cover that here.
I still benefit from the above functionality almost every week, I hope it serves you well too.