PDA

View Full Version : After forking, parent program gets signal 17 only when under gdb



charumbem
06-14-2017, 11:01 PM
Strange issue I've been dealing with for a few hours here, thought I would post instead of just lurking since I've got a lot of good info from reading threads here!

I have a program that forks itself to create a worker which it communicates with over a set of pipes. The forked process starts fine and begins its work, which involves making libcurl requests and sending the results back to the parent process over one of the pipes.

The issue I'm running into is that, while this all works fine when run outside of gdb, as soon as I run it under gdb, signal 17 (SIGCHLD) is raised, from what I can tell it looks like during the libcurl request. My code catches the signal and restarts the worker, which crashes at the same point.

I'm pretty new with gdb, so I'm not sure how to go about debugging this or what could possibly cause this signal to be fired only when run under gdb. It doesn't make a lot of sense to me.

A crash does occur after the program has run for several minutes, but this seems unrelated (I'm sure it's related).

Hoping someone has seen something similar where this kind of signal shows up during gdb debugging unexpectedly, and might be able to send me in the right direction.

Thanks for any input!

Salem
06-15-2017, 01:43 AM
https://sourceware.org/gdb/onlinedocs/gdb/Forks.html

Does this happen if you just let the system run within gdb, or only when you start setting breakpoints, stopping and starting etc?

Which process are you debugging - the parent or child?


(gdb) show follow-fork-mode
Debugger response to a program call of fork or vfork is "parent".


> A crash does occur after the program has run for several minutes, but this seems unrelated (I'm sure it's related).
Is this because the pipes are full, because the debugged parent isn't reading information fast enough to keep up with the free-running child?

Then there are these gotcha's
https://curl.haxx.se/libcurl/c/CURLOPT_NOSIGNAL.html
https://curl.haxx.se/libcurl/c/CURLOPT_TIMEOUT_MS.html

charumbem
06-15-2017, 08:05 AM
Thanks for your reply.

> Does this happen if you just let the system run within gdb, or only when you start setting breakpoints, stopping and starting etc?

The signal is sent to my main program if I just let it run under gdb, no breakpoints or anything set. It is a remote gdb session to an arm machine, probably relevant if there is a very short timing issue involved.

> Which process are you debugging - the parent or child?

I am debugging the parent process. I had tried setting follow fork mode:



set follow-fork-mode child


But the result is the same when I do this. It does not seem to actually attach to the child for some reason.

I just set CURLOPT_NOSIGNAL to 1 and this didn't seem to change anything. Also confirmed that I'm not setting any of curl's timeout options so it should be all defaults. I think the child is actually crashing legitimately and that if I could get gdb to debug it, I would see what the issue is.

>> A crash does occur after the program has run for several minutes, but this seems unrelated (I'm sure it's related).
> Is this because the pipes are full, because the debugged parent isn't reading information fast enough to keep up with the free-running child?

This one only occurs outside of gdb. Under gdb, the earlier crash prevents the curl request from completing, so the child never writes anything to its output pipe. When this crash outside of gdb happens, the parent program is able to read the full contents of the worker's output pipe three times (child writes to it every 30 seconds) before it crashes with an unexpected null pointer. This does seem like some kind of buffer overflow as so far it's always exactly three times (and the curl'd response size sent over the pipe has not varied during testing so far).

I'm going to see if I can get rid of this crash when not running under gdb since that at least seems debug-able and see if fixing that changes the behavior under gdb...

Salem
06-15-2017, 08:11 AM
Is there anything special about the remote, which means you can't debug the same s/w on a Linux host?

charumbem
06-15-2017, 09:59 AM
Is there anything special about the remote, which means you can't debug the same s/w on a Linux host?
In theory no, nothing special about it. It does use a limited Yocto based distro so it's fairly feature complete itself. The only barrier to testing on my normal linux host is that I haven't setup the build environment to build for intel yet.

Probably a good idea to ensure portability, etc. though. I've been under a bit of a time crunch and am still implementing other features, so I just haven't gotten to it. But I should do that.