Thread: No call to execv

  1. #1
    Registered User
    Join Date
    May 2007

    No call to execv


    I've had a quite mysterious problem for quite a time now..
    I have a program that performs calls to underlying resources,
    e.g., SQL*Plus. The HW is Sun Fire v440, Sun OS 9.

    Everything works perfect when the underlying resources are
    available and functional. But when not, the program continues
    to fail.

    Basically, the parent process reads data from a file that is opened
    by freopen. The reason using freopen is to redirect stdout from child
    to this file. And it works ok when resources are available.

    But it seems that, if the call to the underlying resource fails once, it
    keeps on failing. All the time. A workaround is to restart the program,
    but even this fails sometimes..

    The main problem seems to be that the actual call to the underlying
    resource is not performed, whereby no stdout is redirected to the
    file, so the files keep on being 0 byte.

    The problem always happens upon reboot of a server. If my program
    does not wait for the underlying resources, it'll keep on failing.

    Here's the (pseudo) code:

    while (runForever) {
       if child
          freopen(file, "w", stdout)))
          execv (underlying resource);
       if parent
           sleep (untilChildShouldBeDone); // Wait if child's call takes a while...
           kill (theChild); // If child's underlying call hangs, it must be terminated...
           wait (null); // Avoid the defunct processes.. 
    So, it works when the resources are OK. But if not ok, it fails once, and keeps
    on failing. It seems like the actual execv call just is not performed..

    Does anyone have any idea of what might be wrong; how to handle it, or even
    how to achieve the same task differently..?


  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    The edge of the known universe
    > freopen(file, "w", stdout)))
    Is this the same file for all children, or does each one get a different file.
    Check the return result for errors.

    > execv (underlying resource);
    Check the return result for errors. exec() calls do return if they fail.
    One obvious thing is that your process table is full.

    > kill (theChild);
    Are you doing something which can be 'caught' by the child, in order to perform some kind of cleanup, or are you just being brutal and shooting it dead no matter what. Maybe it's just dying whilst resources are still in a locked state.

    The fact that you have such an approach is worrying, why do you have so many child processes which are likely to fail within some expected time period?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Portland, OR
    Quote Originally Posted by protocol78 View Post
    while (runForever) {
       if child
          freopen(file, "w", stdout)))
          execv (underlying resource);
       if parent
           sleep (untilChildShouldBeDone); // Wait if child's call takes a while...
           kill (theChild); // If child's underlying call hangs, it must be terminated...
           wait (null); // Avoid the defunct processes.. 
    If this is pseudocode is correct, you have some serious problems. If the child's execv() fails, it will continue the loop and the child itself will fork again. If execv() continues to fail, this is basically a fork bomb.

    Also, the parent code is critically broken because it calls wait() unconditionally. Again, if execv() fails, there is no path for any of the children to exit, except by your killing signal.

    So this loop spews out a tree of processes, with the left nodes being parents that destroy the tree from the root down, and the right nodes being children who continue to generate more tree as fast as they can. This is obviously not good.
    Last edited by brewbuck; 05-04-2007 at 09:53 AM.

  4. #4
    Registered User
    Join Date
    May 2007
    This thing is just a mystery...

    Some answers to you Salem:

    Well, the file is the same for all children and freopen doest not fail...
    if (0 == (fp = freopen("filename", "w", stdout))) {
    // not good, but we never get here... 
    Well.. the thing with kill is that I somehow have
    to stop waiting for the child in case the child's
    call to execv hangs forever.. I've even tried

    int retv;
    waitpid(childpid, &retv, 0);

    but the main problem is still there (0 byt file size)...

    Ok, I can understand your concerns regarding spawning off
    the children all the time... I'll try to explain why:

    Let's say the requirement is to develop a C-program that calls
    for example SQL*Plus. It must do this every 10:th minute or
    so to get some status. If we knew SQL*Plus would never hang
    there would be no need to fork of the program right. But since
    the underlying resource may hang, someone has to do the job
    for the parent, and the parent must end it if it seems to be
    hanging forever...

    That's why this approach was used...

    The only thing that works right now is to restart the application,
    but this is not a very delicate solution I think.. If you restart it
    manually from a terminal it always works...

    So, since the only current solution is to restart the program, I
    decided that if the file keeps on being 0 byte large, then exit
    after some attempts and let cron restart it.

    But even this fails. If cron starts the program again, the same
    behaviour is still there, which just confuses me even more..

    So right now I'm kind of stuck with my approach.. any suggestions
    would be appreciated..


  5. #5
    Registered User
    Join Date
    May 2007
    Here are some answers to brewbuck:

    The call to execv does not fail, it is checked:

    int ret;
    ret =  execv("path_to_3rd_party_binary", cmd);
    // Check ret

    The reason
    is there,
    after the kill, is that a lot of defunct processes
    appeared in the system if the wait was not there...

    And that is interesting, because:

    The child sets up a signal handler:

    // Remove anachronism warning
    extern "C" {
        typedef void(*funPtr)(int);
    void terminateOnSignal(int signo) {
    //In child: {
    signal(SIGTERM,  reinterpret_cast<funPtr>(terminateOnSignal));
    // }
    So the the parent calls kill the child's sig handler will execute..

    So it just seems that if a timeout or anything occurs in the call to
    execv, the calls will keep on failing... And even if I _exit the children
    upon kill there will be defunct processes if not wait is called..

    Some questions:

    Should I set up signaling differently between child and parent?
    Could I use anything else but _exit(1) when exiting the application..?

    As mentioned, the strange thing is that, if I restart the program from
    a terminal it'll continue work as anticipated, but if I restart it using
    cron, or even if I have a little script to run when the program has
    gone down, to execute it again, it will fail... The file will just not
    be printed to... it'll continue to be 0 byte large..


Popular pages Recent additions subscribe to a feed

Similar Threads

  1. minix system call pls help for project
    By porvas in forum Linux Programming
    Replies: 2
    Last Post: 06-14-2009, 02:40 AM
  2. Error C2664 - Trying to call an external Dll
    By jamez05 in forum C++ Programming
    Replies: 3
    Last Post: 08-08-2006, 06:07 AM
  3. Class won't call
    By Aalmaron in forum C++ Programming
    Replies: 3
    Last Post: 04-13-2006, 04:57 PM
  4. Iterative Tree Traversal using a stack
    By BigDaddyDrew in forum C++ Programming
    Replies: 7
    Last Post: 03-10-2003, 05:44 PM
  5. call by reference and a call by value
    By IceCold in forum C Programming
    Replies: 4
    Last Post: 09-08-2001, 05:06 PM