I would firstly like to say a huge thank you to all of you in these forums. You have been so very helpful in the past and its just really great to have this kind of community online.
Ok now to the question:
I'm using the kill(int pid, SIGSTOP); function to stop a process group (job) when server load gets too high and then restart it again when the servers load goes down again. I just ran my program then and it ran perfectly, however its performance is not reliable. Sometimes the stopping and starting of the process group seems to cause a failure - one of the child processes doesn't stop properly and half of the processes stop and the other half don't. This leads to the process group being broken and obviously a failure. I've tried running my program and watching closely what happens, but like I said before - this time it all worked perfectly. So my question is can anyone tell me is there a time limit on how long a process group can be in the "STOPPED" (or susended) state? I was thinking that perhaps on previous runs the load may have stayed higher for longer and caused the process group to be "STOPPED" for many minutes at a time. I was thinking maybe there was a time limit for how long a process can be stopped?
Or can anyone else think of a reason why the process group (job) would break down and you end up with 2 groups of processes and a failure? Can anyone think of how I could test for this and avoid such a problem? I did notice that when it failed previoulsy the parent process zombied and the child processes kept running and I could not even stop them myself manually from the command line. Is there anyway you could test a process if its stoppable, prior to actually stopping it? What reasons would a process not be stoppable? Thanks you very much to anyone who adds any insight.