Thread: Debugging a rare / unreproducible bug..

  1. #1
    In the Land of Diddly-Doo g4j31a5's Avatar
    Join Date
    Jul 2006
    Posts
    476

    Debugging a rare / unreproducible bug..

    Dunno where else to ask this because it's not quite a programming question. Like the title says, how do I debug a rare / unreproducible bug? My application is having some weird behaviours. It sometimes crashed and went back to Windows and displayed the usual error message with those send / don't send button (you know what I'm talking about), yet sometimes it would run normally. Also because the application needed to be run at least for one whole day, I also needed the application to be robust. And there's one more bug that would happen when I tested it to be run for 1 whole day. It would make go black. I'm quite sure that it's not a screen saver / hardware safe mode issue because when I pressed CTRL+ALT+DEL, the task manager shows up. So, can anybody help me here? How do you usually detect a bug that happens randomly like this? How do you usually test for an application robustness? Thanks in advance.
    ERROR: Brain not found. Please insert a new brain!

    “Do nothing which is of no use.” - Miyamoto Musashi.

  2. #2
    Registered Abuser
    Join Date
    Jun 2006
    Location
    Toronto
    Posts
    591
    Code review. It's probably a buffer overrun somewhere or null pointer dereference. Some may suggest a debugger, but I don't think you're at that point yet. Instead, try looking through your code for "off-by-one" errors, bad loop conditions, etc, generally go through all major constructs and/or functions and think to yourself "under what circumstances can this possibly go wrong?". And short from random bit-flipping caused by cosmic rays, try to code for a contingency.

  3. #3
    In the Land of Diddly-Doo g4j31a5's Avatar
    Join Date
    Jul 2006
    Posts
    476
    Have done that but can't seem to see anything weird. BTW, one more thing that I don't get was the treatment between when it crashed and when it doesn't is the same, nothing at all. No input from any external source, even keyboard. Just the application that runs its own routines all day long. At least for now. Maybe the problem is a call to a null pointer just like you said but I don't know which one because the object creation / deletion is automated from the application itself with some sort of schedulers.
    ERROR: Brain not found. Please insert a new brain!

    “Do nothing which is of no use.” - Miyamoto Musashi.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Is this the debug build or the release build?

    Even if it's the release build, compile it with debug information.

    Install WinDbg and just run the program from within the debugger (no breakpoints or anything, just run it).
    If it does crash, at least you'll find yourself inside the debugger, and not at a meaningless dialog going nowhere.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    verbose cat
    Join Date
    Jun 2003
    Posts
    209
    Make sure you are setting pointers to NULL once you free the memory they point to. If you do
    int *ptr = malloc(MAGIC_NUMBER); and then later free(ptr);, and then dereference ptr before setting it to NULL or re-assigning it to some other chunk of memory, you are likely to experience the exact problem you are describing. ptr is a dangling pointer once you free it and before you re-assign it.

    Once you call free, anything can happen to that block of memory, from nothing to the operating system reclaiming it for another process. If you get "lucky" it will be left alone, even after additional calls to malloc or new. For instance, malloc's algorithm could be skipping this block for some reason, so every time you dereference that dangling pointer, you happen to be referring to the old value and everything seems to work.

    Later (the next few cycles, an hour later, however long it takes for you to call malloc enough that it arrives back at that particular block of memory) the memory is finally overwritten with something else. Now when you dereference ptr, it might give weird results, or it might cause a seg fault, or it might continue operating with no obvious effect. Since you have no way to know what the memory block was overwritten with, you have no way to know how or why it is acting that way (unless you use a debugger and look at the value of the block ptr is pointing to).

    So, while you don't see any difference between the first run and the second, the algorithm malloc is using might be taking different paths, or the operating system might be shuffing memory around and that dangling pointer is left out of loop since the memory it is pointing to is technically pronounced available, etc.
    abachler: "A great programmer never stops optimizing a piece of code until it consists of nothing but preprocessor directives and comments "

  6. #6
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Also, if you have access to a code linter, run it over the program and pay attention to its warnings.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    And adding logging to the code would also help - showing what the program is doing - even if it's not showing where it actually goes wrong or why, it will be very helpful to understand what the steps are to reproduce the problem, so recording each user-action and/or data-input would be useful - perhaps you will then notice something that is different between the crashing and non-crashing scenarios of performing the same steps.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    Registered User
    Join Date
    Apr 2008
    Posts
    890
    Have you run Bounds Checker or something similar on your program?

  9. #9
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,195
    Are you using threads?

    Remember that under windows, you have to CloseHandle() on a thread after it finishes, or you leak handles and eventually the OS will refuse to give you any more of them. A simple way to check if this is the case is to look in task manager under the performance tab and see if the handle count is gradually creeping up.

  10. #10
    In the Land of Diddly-Doo g4j31a5's Avatar
    Join Date
    Jul 2006
    Posts
    476
    Lots of replies already. Thanks guys. I'll try to answer them all at once.

    @Salem: It's the release build. Now that I think about it, the debug version won't run because it always shows an error. But the weird thing is, the release build didn't show this particular error at all. FYI, actually the code that gives an error is from another programmer. He said it's fine as long as the release doesn't show this error. And I just take his word on this.

    @jEssYcAt: Maybe that's the problem. I actually has an idea who the culprit is (see the reply to Salem above). I admit, I haven't checked this code at all. I just assume it worked based on that programmer said. He also used this code for his application, and it (seemed) working just fine.

    @CornedBee: What's a code linter?

    @matsp: Actually, that idea has occured in my head. But I didn't do that because I still need to do some other things.

    @medievalelks: What's a Bounds Checker?

    @abachler: Not that I know of. But I don't know if the other guy used a thread in his code.
    ERROR: Brain not found. Please insert a new brain!

    “Do nothing which is of no use.” - Miyamoto Musashi.

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > He said it's fine as long as the release doesn't show this error.
    He's an idiot!
    If debug and release builds differ "AT ALL" in any respect except speed, then you've got a problem.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    Quote Originally Posted by g4j31a5 View Post
    Now that I think about it, the debug version won't run because it always shows an error.
    That's an obvious place to start, isn't it?
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    I'm with Sang-drax and Salem here: It is highly likely that the debug build is "correctly" pointing out something that is wrong, whilst the release build is missing it, and most of the time it's not making a lot of difference, but sometimes causes a crash. Typically, this is "out of bounds" on memory allocations or arrays.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    In the Land of Diddly-Doo g4j31a5's Avatar
    Join Date
    Jul 2006
    Posts
    476
    Right, maybe I'll have a look at the code. Thanks.
    ERROR: Brain not found. Please insert a new brain!

    “Do nothing which is of no use.” - Miyamoto Musashi.

  15. #15
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    @CornedBee: What's a code linter?
    It's a program that examines your code, without compiling it or running it, and identifies bad practises or possible buffer overruns and that kind of thing. I've used splint before, and it works pretty well.
    Wikipedia page: http://en.wikipedia.org/wiki/Splint_...amming_tool%29
    Download page: http://www.splint.org/
    Windows download page: http://www.splint.org/win32.html

    @medievalelks: What's a Bounds Checker?
    It's a program or a library that examines your program as it is run, detecting any buffer overruns or memory errors. The only one I've really used is Valgrind, which is fantastic, but unfortunately only runs under Linux. I've heard of Purity, Electric Fence, and dmalloc(), but never tried any of them.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. ATL bug of CComPtr?
    By George2 in forum Windows Programming
    Replies: 6
    Last Post: 04-07-2008, 07:52 AM
  2. Dev-C++: Problems with Breakpoint Debugging
    By Thileepan_Bala in forum C Programming
    Replies: 1
    Last Post: 01-17-2008, 10:48 AM
  3. Debugging Dev C++
    By tuurb046 in forum Tech Board
    Replies: 10
    Last Post: 08-16-2007, 12:51 PM
  4. Debugging book recommendation
    By dagans in forum Projects and Job Recruitment
    Replies: 1
    Last Post: 09-13-2005, 07:35 PM
  5. Debugging leads to buggy code and longer hours?
    By no-one in forum A Brief History of Cprogramming.com
    Replies: 6
    Last Post: 01-28-2002, 11:14 AM