Signal and exception handling

This is a discussion on Signal and exception handling within the C++ Programming forums, part of the General Programming Boards category; Hello, I am running a very computationally expensive program that is parallelised and still runs for many days. It is ...

  1. #1
    nts
    nts is offline
    Registered User
    Join Date
    Nov 2007
    Posts
    10

    Signal and exception handling

    Hello,

    I am running a very computationally expensive program that is parallelised and still runs for many days. It is therefore undesirable for it to stop if a runtime error occurs; precious CPU time on our Linux cluster would be lost. In order to handle abort() and other runtime problems gracefully, I have therefore installed a signal handler that will throw an exception instead of exiting right away. This exception can then be caught and the program can continue without wasting CPU time.

    Or so I thought.

    This is the problem I have:

    I install a signal handler that displays a message, then throw()s an exception. One line of code that is running somewhere calls abort(). This call is within a try statement. The corresponding catch(...) statement, however, is never reached. Instead, the OS calls terminate again, recursively calling the signal handler without ever terminating (until the stack overflows or something). This is the message I get (on stdout/stderr):

    Code:
    NeuroEvolution: Signal 6 (Aborted) received.  Throwing an exception...
    terminate called after throwing an instance of 'int'
    NeuroEvolution: Signal 6 (Aborted) received.  Throwing an exception...
    NeuroEvolution: Signal 6 (Aborted) received.  Throwing an exception...
    terminate called recursively
    NeuroEvolution: Signal 6 (Aborted) received.  Throwing an exception...
    NeuroEvolution: Signal 6 (Aborted) received.  Throwing an exception...
    terminate called recursively
    etc. (you get the picture). The "NeuroEvolution: ..." message is from my signal handler, which looks like this:

    Code:
    2292 void CNeuroEvolution::ThrowingSignalHandler (int signal)
    2293 {
    2294     cerr << "NeuroEvolution: Signal " << signal
    2295          << " (" << strsignal (signal) << ")"
    2296          << " received.  Throwing an exception..." << endl;
    2297     
    2298     throw (signal);
    2299 }
    It was installed using this code:

    Code:
    2302 /// handle signals like ABRT and SEGV by having them throw an exception
    2303 void CNeuroEvolution::InstallSignalHandler()
    2304 {
    2305     struct sigaction signal_handling;
    2306     
    2307     signal_handling.sa_handler = &ThrowingSignalHandler;
    2308 
    2309     sigfillset(&(signal_handling.sa_mask));
    2310     signal_handling.sa_flags = SA_NOCLDSTOP;
    2311     
    2312     sigaction(SIGILL, &signal_handling, NULL);
    2313     sigaction(SIGTRAP, &signal_handling, NULL);
    2314     sigaction(SIGABRT, &signal_handling, NULL);
    2315     sigaction(SIGFPE, &signal_handling, NULL);
    2316     sigaction(SIGSEGV, &signal_handling, NULL);
    2317     
    2318 }
    I thought that by using sigfillset etc. I disable any recursive call to the handler? I have also tried sigemptyset at this place with (apparently) no change to the situation.

    Here is the catch statement I would like to end up at (the abort occurs in a function called by a function called by a function called by DoSomething):

    Code:
    1682 try
    1683 {
    1684     // Something that could go wrong
    1685     DoSomething(data);
    1686 }
    1687 catch (...)
    1688 {
    1689     // something has gone wrong
    1690     // avoid a program abort, print a message and do default action
    1691 
    1692     cerr << " NeuroEvolution: Error occured, removing incorrect data... " << endl;
    1693 
    1694     continue;
    1695 }
    This is the relevant part of the stack backtrace:

    Code:
    #32247 0x00002b93bc7ffbbb in PAC::CNeuroEvolution::ThrowingSignalHandler (signal=6) at NeuroEvolution.cpp:2298
    #32248 <signal handler called>
    #32249 0x00002b93bedb8aa5 in raise () from /lib64/libc.so.6
    #32250 0x00002b93bedb9e60 in abort () from /lib64/libc.so.6
    #32251 0x00002b93be8afe5b in std::set_unexpected () from /usr/lib64/libstdc++.so.6
    #32252 0x00002b93be8af24b in __cxa_bad_cast () from /usr/lib64/libstdc++.so.6
    #32253 0x00002b93be8afceb in __gxx_personality_v0 () from /usr/lib64/libstdc++.so.6
    #32254 0x00002b93bec84748 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
    #32255 0x00002b93bec848dc in _Unwind_RaiseException () from /lib64/libgcc_s.so.1
    #32256 0x00002b93be8aff5d in __cxa_throw () from /usr/lib64/libstdc++.so.6
    #32257 0x00002b93bc7ffbbb in PAC::CNeuroEvolution::ThrowingSignalHandler (signal=6) at NeuroEvolution.cpp:2298
    #32258 <signal handler called>
    #32259 0x00002b93bedb8aa5 in raise () from /lib64/libc.so.6
    #32260 0x00002b93bedb9e60 in abort () from /lib64/libc.so.6
    #32261 0x00002b93be8afe5b in std::set_unexpected () from /usr/lib64/libstdc++.so.6
    #32262 0x00002b93be8af24b in __cxa_bad_cast () from /usr/lib64/libstdc++.so.6
    #32263 0x00002b93be8afceb in __gxx_personality_v0 () from /usr/lib64/libstdc++.so.6
    #32264 0x00002b93bec84748 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
    #32265 0x00002b93bec848dc in _Unwind_RaiseException () from /lib64/libgcc_s.so.1
    #32266 0x00002b93be8aff5d in __cxa_throw () from /usr/lib64/libstdc++.so.6
    #32267 0x00002b93bc7ffbbb in PAC::CNeuroEvolution::ThrowingSignalHandler (signal=6) at NeuroEvolution.cpp:2298
    #32268 <signal handler called>
    #32269 0x00002b93bedb8aa5 in raise () from /lib64/libc.so.6
    #32270 0x00002b93bedb9e60 in abort () from /lib64/libc.so.6
    #32271 0x00002b93bedb2246 in __assert_fail () from /lib64/libc.so.6
    #32272 0x00002b93bc7e7045 in CLinearGenome::SortSubNetworks (this=0x7fffee3e6720, Start=0, End=108, 
        Visited=0x7fffee3e6450) at LinearGenome.cpp:1106
    I have spent a lot of time trying to get it to run and would be very happy if you could point me to a solution.

    Please let me know if you need any more details. I am using gcc (GCC) 4.1.0 (SUSE Linux).

    Many thanks in advance,

    Nils.

    PS: Please do not suggest to change the cause for the abort() call, I had that idea myself already :-) Most problems occur in a part of the code I have not written myself and it would be too time-consuming to debug all of that.

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,893
    Signal handlers must return normally. Because they're abnormally called, you cannot expect the C++ exception mechanism to handle this correctly.

    Basically, those two don't go together. Truth is, there is very little you can do in signal handlers.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    nts
    nts is offline
    Registered User
    Join Date
    Nov 2007
    Posts
    10

    Unhappy but...

    Dear CornedBee,

    Thanks for your reply!

    Quote Originally Posted by CornedBee View Post
    Signal handlers must return normally. Because they're abnormally called, you cannot expect the C++ exception mechanism to handle this correctly.
    How can a signal handler that was called by abort() return? If it does, where would it go? Please elaborate a little bit on the reason so that I can understand it better.

    Quote Originally Posted by CornedBee View Post
    Basically, those two don't go together. Truth is, there is very little you can do in signal handlers.
    So, do you have any other suggestion what I might do in this case?

    Many thanks and regards,

    Nils.

  4. #4
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,420
    So in the midst of all this crashing, what makes you think that you're going to get the correct answer (eventually)?

    If the s/w is bad enough to cause exceptions (like the ones you're getting), it's definitely bad enough to give you the wrong answers.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Hang on... A call to abort is usually because the code is "horribly confused" and "can't figure out what to do". Surely the RESULT after "ignoring" such a abort() call would be completely unedefined and unpredictable, and just result in further computation of undefined results? Or do you have some way of "figuring out which parts where affected and thus recover some valuable data"?

    I'm just sort of expecting the code that calls abort() to not be able to continue (in a meaningful way) where it came from anyways, even if you override the abort itself.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    nts
    nts is offline
    Registered User
    Join Date
    Nov 2007
    Posts
    10
    Dear Salem,

    Quote Originally Posted by Salem View Post
    So in the midst of all this crashing, what makes you think that you're going to get the correct answer (eventually)?

    If the s/w is bad enough to cause exceptions (like the ones you're getting), it's definitely bad enough to give you the wrong answers.
    That's a good point, thanks for your feedback. However, in this case it really would make sense to continue: To sum it up, I am generating random data structures in a loop, then testing them by using them. If using them generates an error I discard the data structure, then continue the loop looking for a datum that does not create an error.

    So, it really does make sense in this special case. The data that causes the problem is discarded and the code works fine with the other data.

    Also, as I said earlier, I simply do not have the time to debug the faulty code (which I have not written myself) since it is too complex.

    If anybody has an idea how to do it in any other way please let me know.

  7. #7
    nts
    nts is offline
    Registered User
    Join Date
    Nov 2007
    Posts
    10

    yes

    Dear Mats,

    Quote Originally Posted by matsp View Post
    Hang on... A call to abort is usually because the code is "horribly confused" and "can't figure out what to do". Surely the RESULT after "ignoring" such a abort() call would be completely unedefined and unpredictable, and just result in further computation of undefined results? Or do you have some way of "figuring out which parts where affected and thus recover some valuable data"?
    Exactly that, as I just wrote in my reply above. The problems only occur with specific data -- which, however, cannot easily be tested for this feature except by using it. This, in turn, creates the signal, which I can then use as a measure for the data, and remove it.

    Quote Originally Posted by matsp View Post
    I'm just sort of expecting the code that calls abort() to not be able to continue (in a meaningful way) where it came from anyways, even if you override the abort itself.
    ... and this is why I got the idea with the exception! The catch statement knows exactly what to do and is placed at the correct level to continue in a meaningful way.

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Ok, obviously me and Salem posted at the same time.

    So, presumably this is something like "the calculation of the input values lead to incorrect result [obvious ones are: divide by zero, negative square root, but of course any partial result that is invalid for this particular application]" under some circumstances, and when the (large external) code find this, it calls abort().

    Is there many different places that cause this call to abort? Would it be possible to catch this in a different manner?

    I'm just dubious that the current approach will actually work right.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    This probably isn't possible, or at least not easy, but is there a way to redefine abort() to call your own abort_exception() function?

    I've seen some C code that does something like this:
    Code:
    #define malloc  myMalloc
    I'm guessing you'd have to do that in all the files that call abort() though... So I guess I'm wondering if there's a way to globally re-link all the object files to call your abort() instead of the regular C abort()?

  10. #10
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    cpjust makes a good point: just replace "abort()" with your own function - and you don't really need a different name, as long as you link that object file before the standard C library, it should take your abort instead of the other one.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  11. #11
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by matsp View Post
    cpjust makes a good point: just replace "abort()" with your own function - and you don't really need a different name, as long as you link that object file before the standard C library, it should take your abort instead of the other one.

    --
    Mats
    This sounds like an interesting experiment, so I'll probably try it out when I get some time.
    Do you know if that would still work if the code that's calling abort() was in a separate .lib file that's being linked or would the .lib file need to be recompiled first?

    Also, wouldn't you get Linker errors about abort() already defined in file blah...? I've always found those kinds of errors a pain in the ass to get rid of.

  12. #12
    nts
    nts is offline
    Registered User
    Join Date
    Nov 2007
    Posts
    10

    yes, maybe...

    Quote Originally Posted by cpjust View Post
    This probably isn't possible, or at least not easy, but is there a way to redefine abort() to call your own abort_exception() function?

    I've seen some C code that does something like this:
    Code:
    #define malloc  myMalloc
    I'm guessing you'd have to do that in all the files that call abort() though... So I guess I'm wondering if there's a way to globally re-link all the object files to call your abort() instead of the regular C abort()?
    Dear cpjust,
    Dear Mats,

    Thanks for your input. Maybe it is possible to do something along those lines. I am unsure whether it is possible to shadow abort since I am not calling it directly. Also, sometimes the code gets SEGV. However, I will try to work in that direction and post the outcome. I was also thinking I could set some flag instead of throwing an exception (if exception handling needs a valid stack to backtrace and the signal handler caller corrupts it), then hope that the code somehow reaches the place where my catch is, and query the flag there instead of catch()ing.

    Thanks again for you help, everyone who has replied!

    Kind regards,

    Nils.

  13. #13
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,230
    Quote Originally Posted by CornedBee View Post
    Signal handlers must return normally. Because they're abnormally called, you cannot expect the C++ exception mechanism to handle this correctly.
    There have been some attempts to make it work. None have been 100% successful that I've heard of. The GNU C library allows you to longjmp() out of a signal handler, so it seems theoretically possible to throw an exception from a signal handler, too.

    However, any such method would be fraught with difficulty. Another thing to consider is that if the signal handler is processing an ASYNCHRONOUS signal, you have no idea when this signal occurs. You might be in the middle of constructing an object. You might be in the middle of a memory allocation or something. Normal C++ exceptions only occur at well-defined points in execution, but an exception triggered from an asynchronous signal handler violates these rules.

    In short, I don't think this is going to work at all. Instead, rely on the tried-and-true method of setting some volatile flag variable inside your signal handler, which you periodically check (probably in some mid-level loop somewhere). If you see that the flag is set, THEN you throw an exception.

  14. #14
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,230
    Quote Originally Posted by nts View Post
    Thanks for your input. Maybe it is possible to do something along those lines. I am unsure whether it is possible to shadow abort since I am not calling it directly.
    Nothing should be calling it. I don't know of any C library routines which call abort(). If you're not calling it, nobody is calling it.

    Also, sometimes the code gets SEGV.
    Under UNIX, the result of ignoring a SIGSEGV signal which was not generated with raise() or kill() is undefined. It will probably lead to an infinite loop. Here's what happens:

    1. Code attempts to access bogus memory location
    2. Processor signal interrupt, calls SIGSEGV handler
    3. Handler returns, CPU goes to the same instruction again
    4. Go to step 1

    I was also thinking I could set some flag instead of throwing an exception
    yes, Yes, YES.

  15. #15
    nts
    nts is offline
    Registered User
    Join Date
    Nov 2007
    Posts
    10
    Dear Brewbuck,

    Quote Originally Posted by brewbuck View Post
    There have been some attempts to make it work. None have been 100% successful that I've heard of. (...)
    Thanks, I also googled about that and had still some hope left that I could make it work. I suspect that the exception handling tries something that fails because it relies on a clean state of the stack, like backtracking instead of doing a longjump-like thing to reach the catch?

    Quote Originally Posted by brewbuck View Post
    In short, I don't think this is going to work at all. Instead, rely on the tried-and-true method of setting some volatile flag variable inside your signal handler, which you periodically check (probably in some mid-level loop somewhere). If you see that the flag is set, THEN you throw an exception.
    That's probably the best way, thanks very much.

    I will try some more and keep (all of) you posted.

    Best regards,

    Nils.

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Signal handler function - pointer to this gets lost
    By maus in forum C++ Programming
    Replies: 1
    Last Post: 07-01-2009, 09:10 AM
  2. signal handling and exception handling
    By lehe in forum C++ Programming
    Replies: 2
    Last Post: 06-15-2009, 10:01 PM
  3. Atomic Operations
    By Elysia in forum Windows Programming
    Replies: 27
    Last Post: 03-27-2008, 02:38 AM
  4. POSIX Signal Handling
    By nine-hundred in forum C Programming
    Replies: 8
    Last Post: 04-13-2007, 02:08 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21