I have a program that monitors server load and sends SIGSTOP signal to a process group via killpg(pid, SIGSTOP) to reduce load and then SIGCONT again to restart. The problem I'm having is one of the child processes in this process group will at times ignore the SIGSTOP signal, all the other processes stop but at random times this process will not. So all the other processes die off and this one process is left running. If you try to stop this process from the command line for about 5 minutes it will mostly ignore SIGSTOP - but sometimes it will stop for a few seconds but then it will start up again all by itself. After these 5 minutes or so it will again stop like normal - well mostly. Its driving me nuts because its breaking the rules as far as I can tell.
Can anyone tell me any reason why a SIGSTOP signal may fail on this process or on any process? And is there anyway to work around such a situation? Such as being able to detect when a process may be likely to ignore a SIGSTOP signal and perhaps wait until its state changes back to normal again?
Please any help will be so greatly appreciated I can't begin to explain