Thread: Help/ideas on how to catch a difficult but that write at random memory locations

  1. #1
    Registered User
    Join Date
    Feb 2008
    Posts
    147

    Help/ideas on how to catch a difficult but that write at random memory locations

    Hello,

    I have a complex program, around 25.000 code lines. It is quite stable, but I have a bug which only happens from time to time, and when in online-mode, so it is quite difficult for me to reproduce it. I have put lot of efforts to see where the problem is, and I have seen where the code crash, but that line is not the culprit, it only refers to read a memory location where it is supposed to be good data, and it is not. The real bug has happens before in execution. That data is read-only, I mean, I only compute it at program start-up, and then never write into it again, but my bug write to it doring code run, maybe because bad pointers address in any place, and look ramdom (different snapshots execution write at different memory locations).

    My question is, is there a way to know when my code 'write' into that 'to me protected memory'? I thinking if there is a method to make a hash of all variables and arrays in that memory section, or any way to 'assert' when that big memory section is written, taking into accound that when I refer to that memory section it could be any of the lots of variables and arrays I precompute at start-up.

    any help on how to catch this difficult bug would be wellcome.

    thanks.

  2. #2
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    If the code is compiled without optimising and with the compiler emitting debug information, any debugger will be able to identify what code writes to the affected memory.

    The catch is that, unless your code is compiled that way, it will need to be recompiled. And the changes will affect layout of memory in your program, so may cause the offending code to write to a different location in memory.

    You could also add output statements (e.g. write the affected values to stdout or stderr in selected places) and use those output values to localise the code which causes the problem. The catch is that, potentially, even an additional I/O statement can potentially change memory layout of the program (although that is less likely than recompiling with different settings, it is possible).

    If you don't want to do the above (or tried them and find the symptom changed) I suggest eliminating sections of your program systematically. If the symptom vanishes on removing some section of code, then that's not definitive, but does give hints. The possibilities are

    1) there is a flaw in the code you just removed.
    2) there is a flaw in some code you previously removed, which interacts somehow with the code you just removed.
    3) there is a flaw in some code you haven't removed yet, which interacts with some code you have just removed.
    4) combinations of the above

    There could still be a flaw in code you haven't removed, but a systematic approach keeping the above points in mind can narrow down the odds.

    Not withstanding your claim the code is complex, I suggest using code inspection - 25000 may seem like a lot to you, but it's really not that much, unless the code is really badly written. Look for code that uses computed pointers (e.g. a pointer used to select something based on user input), that uses computed array indices (look for off-by-one errors in loops, falling off the end of an array, indices computed from data in files, etc). Also look for code that is excessively "clever" in playing with pointers - usually programmers who are too clever playing with pointers get it wrong.

    Welcome to the land of debugging. It is VERY difficult to write code that is bug free. It is often VERY VERY difficult to find a bug once it exists.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Which OS / Compiler are you using? This helps us identify suitable tools.

    For example, if you set up a large block of memory at initialisation time, which is supposed to be read-only, but later corrupted, then one thing to look into is how your OS could protect areas of memory by marking them read-only.

    Then at least the program would crash on the errant write instruction (giving you useful information), rather than having to wait until something stumbles into the minefield.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Difficult to catch bug
    By Kempelen in forum C Programming
    Replies: 11
    Last Post: 04-15-2013, 11:56 AM
  2. please see the random number gen difficult post
    By Chris_1980 in forum C Programming
    Replies: 2
    Last Post: 04-30-2010, 09:44 AM
  3. random number gen difficult
    By Chris_1980 in forum C Programming
    Replies: 21
    Last Post: 04-30-2010, 09:02 AM
  4. Accessing specific memory locations
    By Bladactania in forum C Programming
    Replies: 45
    Last Post: 02-27-2009, 03:25 PM
  5. Order of Memory Locations
    By vlrk in forum Linux Programming
    Replies: 11
    Last Post: 06-30-2008, 07:11 AM