Thread: SIGSTOP signal being ignored - why?

  1. #1
    Registered User
    Join Date
    Sep 2003
    Posts
    31

    SIGSTOP signal being ignored - why?

    Hi,
    I have a program that monitors server load and sends SIGSTOP signal to a process group via killpg(pid, SIGSTOP) to reduce load and then SIGCONT again to restart. The problem I'm having is one of the child processes in this process group will at times ignore the SIGSTOP signal, all the other processes stop but at random times this process will not. So all the other processes die off and this one process is left running. If you try to stop this process from the command line for about 5 minutes it will mostly ignore SIGSTOP - but sometimes it will stop for a few seconds but then it will start up again all by itself. After these 5 minutes or so it will again stop like normal - well mostly. Its driving me nuts because its breaking the rules as far as I can tell.

    Can anyone tell me any reason why a SIGSTOP signal may fail on this process or on any process? And is there anyway to work around such a situation? Such as being able to detect when a process may be likely to ignore a SIGSTOP signal and perhaps wait until its state changes back to normal again?

    Please any help will be so greatly appreciated I can't begin to explain
    Last edited by brett; 06-25-2007 at 06:14 AM.

  2. #2
    Registered User
    Join Date
    Jun 2007
    Posts
    3
    Hello,

    Honestly, I have no idea why a process is ignoring SIGSTOP. This should never happen.
    So, I cannot directly help you solving your problem, but I can propose a workaround:

    Since you intention is reducing the server load, why don't you consider renice?
    You can change the priority of your process group to a very low one, and your server
    will starting allocating most of it's resourses to the other processes. This way, it will
    be very responsive, and you process group will continue running normally, without
    ever stopping.

    When the load is smaller, the extra resources of the system that are not being used
    will be allocated back to these low-priority process automatically. This way, you don't
    have to activelly monitor anything at all.

  3. #3
    Registered User
    Join Date
    Sep 2003
    Posts
    31
    Thank you for your response .

    Im looking into some other way of controlling server load. My true aim is to turn fast high CPU intensive processes into slow low intensive CPU processes. Nice, Renice are not effective enough - I'm now researching on other process scheduling type settings, like changing timeslice of the runtime of the process and other such settings - if they can be changed??? What kind of functions are avaialble to modify scheduling? Changing the CPU intensity of a running process? What methods are available? Sorry I'm not too knowledgable about this stuff yet, still reading up on it all. Any input is greatly appreciated, thanks in advance.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by brett View Post
    Hi,
    I have a program that monitors server load and sends SIGSTOP signal to a process group via killpg(pid, SIGSTOP) to reduce load and then SIGCONT again to restart. The problem I'm having is one of the child processes in this process group will at times ignore the SIGSTOP signal, all the other processes stop but at random times this process will not. So all the other processes die off and this one process is left running. If you try to stop this process from the command line for about 5 minutes it will mostly ignore SIGSTOP - but sometimes it will stop for a few seconds but then it will start up again all by itself. After these 5 minutes or so it will again stop like normal - well mostly. Its driving me nuts because its breaking the rules as far as I can tell.
    Possibilities:

    1. The process which fails to stop has changed it process group, so it never receives the signal.
    2. Some other process is SIGCONT'ing this process without your knowledge. Perhaps a debugger is hooked to it?
    3. There is some undocumented restriction on SIGSTOP with killpg()
    4. Your kernel is buggy.

    SIGSTOP is supposed to be unignorable. So either the signal is never even reaching the process in the first place, or some other process is restarting it, are the two most probably scenarios, I think.

  6. #6
    Registered User
    Join Date
    Sep 2003
    Posts
    31
    1. The process which fails to stop has changed it process group, so it never receives the signal.
    If this is the case, and I think this could be likley, how do you stop this happening and how could I force the process to return to (or remain in) the original process group?

    But one problem with this is the process even when running in its own process group just ignores SIGSTOP signals. I can execute "kill -STOP pid" or "kill -SIGSTOP pid" commands all I want but the process just ignores them. Shouldn't an strace show ALL signals being sent to a process? When this process is being strace'd (strace -tt -e trace=signal -p pid) shouldn't it show ALL signals? It never shows any signals trying to SIGCONT it.

  7. #7
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by brett View Post
    If this is the case, and I think this could be likley, how do you stop this happening and how could I force the process to return to (or remain in) the original process group?
    As far as I know there is no way to prevent a process from doing that.

    But one problem with this is the process even when running in its own process group just ignores SIGSTOP signals. I can execute "kill -STOP pid" or "kill -SIGSTOP pid" commands all I want but the process just ignores them. Shouldn't an strace show ALL signals being sent to a process? When this process is being strace'd (strace -tt -e trace=signal -p pid) shouldn't it show ALL signals? It never shows any signals trying to SIGCONT it.
    Hmm. Some googling shows that there may be weird interactions between strace and SIGCONT, but I'm not entirely sure they should apply in this case. At any rate, SIGCONT is an unusual kind of signal in that it forces the continuation of the process, while on the other hand, ptrace() is supposed to block a process whenever it receives a signal. There may be some dynamic going on there I'm not familiar with.

    Is it some particular process that always does this, or is it random? I wonder if one of the other processes in the group is periodically sending a SIGCONT and for some reason strace isn't showing that fact.

  8. #8
    Registered User
    Join Date
    Sep 2003
    Posts
    31
    Its the same process, the error is repeatable.

    What about signal masking? Or could using the ptrace function be my best bet at controlling functions?

    If you stop a child process can you do so without the parent knowing?

    The first problem I encountered was trying to stop children processes and it just stuffed up the parent process. Thats when I learnt to send SIGSTOP signals to process groups. But can you stop a child and make it appear to the parent that the child is still executing so the parent just keeps on waiting?
    Last edited by brett; 06-25-2007 at 10:00 PM.

  9. #9
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by brett View Post
    What about signal masking? Or could using the ptrace function be my best bet at controlling functions?
    If you really want to stop the parent from receiving a child status when a child stops, then yes, the only way to do that is to ptrace() the parent and trap any SIGCHLD signal it receives. You'll have to dig inside the siginfo in order to figure out what child it was, and whether the SIGCHLD is relevant -- you don't want to block all SIGCHLD, because that might end up creating zombies.

    This whole thing is becoming rather complicated -- are you sure there's no better way to do what you want? On the other hand, you're getting good experience programming Linux, so I'm not going to totally discourage you

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 3
    Last Post: 07-07-2009, 10:05 AM
  2. Replies: 3
    Last Post: 10-15-2008, 09:24 AM
  3. Signal and exception handling
    By nts in forum C++ Programming
    Replies: 23
    Last Post: 11-15-2007, 02:36 PM
  4. NAQ: Everything you never wanted to know about CPP
    By evildave in forum C Programming
    Replies: 21
    Last Post: 12-12-2005, 10:56 AM
  5. signal handling
    By trekker in forum C Programming
    Replies: 2
    Last Post: 07-05-2002, 02:52 AM