Thread: Processes not dying

  1. #1
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445

    Processes not dying

    I have a server program that forks off child processes to handle network connections. for some reason, the child processes are not dying when I call exit(0). I use MySQL for database access, and it is high on my list of suspects, but at this point I have no proof that the MySQL client library is causing any problems. the network connections are getting closed, as evidenced by the fact that I can look in /proc/<pid>/fd and see that only 0, 1, and 2 (stdin, stdout, and stderr, in no particular order) remain. I am intercepting the SIGCHLD signal, which calls wait(), in order to collect the terminated processes, but for some reason, the processes don't go away. they still show up on the output of ps ax, and not as zombies, so somehow, the processes are still running. I've done everything I can think of to fix this, and I'm running out of ideas. Please give me a few suggestions of things to try.

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    That should be quite impossible. For testing, though, you can try calling _exit() instead of exit(), thus avoiding atexit()-registered functions, one of which might, if it's very badly behaved, call longjmp().

    But primarily I'd try making absolutely sure that the exit() call is reached at all.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    You can also attach to a "dying" process in gdb for example - just "attach <pid>" after starting gdb without any arguments.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #4
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    I'm not setting any atexit() functions, but I'll definitely give the _exit() thing a try. also, I'm not especially familiar with gdb, but I'll try attaching to a process and see what happens.

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Ah, you need a "beginners guide to GDB", ok, so after you attached to the process, you can "break" by pressing CTRL-C, and type "stack" or "backtrace" to list which function you are in, and the "call-stack" of functions leading up to that point.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    type "stack" or "backtrace"
    Or "bt". It's the shortest variant. And when you've used command line debuggers a bit, you'll really appreciate short commands
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  7. #7
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by matsp View Post
    Ah, you need a "beginners guide to GDB", ok, so after you attached to the process, you can "break" by pressing CTRL-C, and type "stack" or "backtrace" to list which function you are in, and the "call-stack" of functions leading up to that point.

    --
    Mats
    I'll have to rebuild the project in debug mode, but that's only about 10 minutes worth.... any recommendations of a good "beginner's guide" for gdb?

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    This may be a good place to start:
    http://www.cs.princeton.edu/~benjasik/gdb/gdbtut.html

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    If the processes are not in state 'Z' what state are they in?

    Maybe the mysql client library is blocking SIGCHLD for some reason.

  10. #10
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    most are showing 'S+' for their status, meaning they are running, but sleeping.

    as far as the mysql client library catching SIGCHLD, I doubt it. I'm definitely catching it in my handler.

    we believe we have tracked this issue down. the client program was calling recv() after having read all available data from the socket, causing it to block, and of course the server sits and waits on recv() for each client connection, which also blocks, so it looks like it wasn't actually a problem with the server after all. however, I have since added a thread that starts in each child process, and each time a request comes in from a client connection, it stores the tim at which it occured in a global variable, which the thread looks at every 5 seconds. if the difference between NOW and the stored time is greater than a specific timeout, the thread exits, and the process terminates. the client developers have since fixed this problem, and we are putting out our update today, after which we will see if it is fixed.

  11. #11
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Elkvis View Post
    most are showing 'S+' for their status, meaning they are running, but sleeping.

    as far as the mysql client library catching SIGCHLD, I doubt it. I'm definitely catching it in my handler.
    You catching it in your handler has nothing to do with whether mysql is blocking the signal. The two are independent. You could install a signal handler which would never be called, because the signal itself is blocked. But you say you've figured it out, so that's probably not what's happening.

    we believe we have tracked this issue down. the client program was calling recv() after having read all available data from the socket, causing it to block, and of course the server sits and waits on recv() for each client connection, which also blocks, so it looks like it wasn't actually a problem with the server after all. however, I have since added a thread that starts in each child process, and each time a request comes in from a client connection, it stores the tim at which it occured in a global variable, which the thread looks at every 5 seconds. if the difference between NOW and the stored time is greater than a specific timeout, the thread exits, and the process terminates. the client developers have since fixed this problem, and we are putting out our update today, after which we will see if it is fixed.
    So instead of using TCP's intrinsic back-off and timing systems you are hacking in your own? That makes no sense. The recv() WILL eventually return after a timeout.

    The problem seems to be the design of the protocol itself. You don't know there is no more data coming and blindly call recv().

  12. #12
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by brewbuck View Post
    You catching it in your handler has nothing to do with whether mysql is blocking the signal. The two are independent. You could install a signal handler which would never be called, because the signal itself is blocked. But you say you've figured it out, so that's probably not what's happening.
    the point I was trying to make is that I'm definitely catching the signal, because my SIGCHLD handler prints to the screen every time a child process terminates, and I can see it happening.

    Quote Originally Posted by brewbuck View Post
    So instead of using TCP's intrinsic back-off and timing systems you are hacking in your own? That makes no sense. The recv() WILL eventually return after a timeout.

    The problem seems to be the design of the protocol itself. You don't know there is no more data coming and blindly call recv().
    straight from the manpage for recv() : (http://www.penguin-soft.com/penguin/...an2/recv.2.inc)
    "If no messages are available at the socket, the receive calls wait for a message to arrive, unless the socket is nonblocking."

    I have not set my sockets to non-blocking, therefore recv() should block until there is data available. Perhaps you have some suggestions for how I might take advantage of "TCP's intrinsic back-off and timing systems." Perhaps you could share them with me, rather than simply telling me that I'm doing it wrong.

    I eagerly await your advice.

  13. #13
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Elkvis View Post
    straight from the manpage for recv() : (http://www.penguin-soft.com/penguin/...an2/recv.2.inc)
    "If no messages are available at the socket, the receive calls wait for a message to arrive, unless the socket is nonblocking."
    That is not the complete picture. recv() is a protocol-agnostic system, as is the entire socket layer in general. TCP/IP itself has underlying timeout features which will cause the connection to be treated as "dead" after a certain amount of time.

    I have not set my sockets to non-blocking, therefore recv() should block until there is data available. Perhaps you have some suggestions for how I might take advantage of "TCP's intrinsic back-off and timing systems." Perhaps you could share them with me, rather than simply telling me that I'm doing it wrong.
    I didn't describe how to do it, because no description is necessary. It's all automatic.

    Again, I think the fundamental problem is not having a way of knowing when the data is finished, leaving you to resort to such tricks. It's not robust -- what if data was merely delayed by a fraction of a second longer than your timeout? My suggestion is to modify the protocol so you know how many bytes to expect.

    Did not mean to sound snippy.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 34
    Last Post: 05-27-2009, 12:26 PM
  2. Task Manager: Applications vs Processes
    By Shwick in forum Windows Programming
    Replies: 3
    Last Post: 08-14-2008, 06:47 AM
  3. Stopping Processes Question
    By brett in forum Linux Programming
    Replies: 3
    Last Post: 06-24-2007, 10:15 PM
  4. binary tree of processes
    By gregulator in forum C Programming
    Replies: 1
    Last Post: 02-28-2005, 12:59 AM
  5. Unix processes
    By J-Dogg in forum Linux Programming
    Replies: 1
    Last Post: 03-24-2003, 05:42 PM