Thread: Need Help finding potential Memory issue

  1. #1
    Registered User
    Join Date
    Oct 2008
    Posts
    11

    Need Help finding potential Memory issue

    Hey all,

    I've got a question...

    The company I work for is a Supply Chain software company and we have some old legacy code that is creating core dumps. Now I've used GDB to look at the core dump and find where in the core dump it is occuring. Here is the important part:
    Code:
    #0  0x60000000ca9f0560:0 in sFreePckMovPath (PckMovPaths=0x27272020)
        at varAllocInv.c:9768
    9768            while (PckMovPaths->prev)
    (gdb) bt
    #0  0x60000000ca9f0560:0 in sFreePckMovPath (PckMovPaths=0x27272020)
        at varAllocInv.c:9768
    #1  0x60000000ca9bbc70:0 in vartrnFreePickList (List=0x403b58d0)
        at varAllocInv.c:9846
    #2  0x60000000ca9f47d0:0 in vartrnAllocateInventory (
        pcktyp_i=0x40654700 "TOPOFF-REPLEN", prtnum_i=0x40e2c090 "64871", 
        prt_client_id_i=0x40e22f10 "MDC", pckqty_i=400, 
        orgcod_i=0x40e8e690 "----", revlvl_i=0x40e2a190 "----", 
        lotnum_i=0x40e81ab0 "----", invsts_i=0x40d44270 "A", 
        invsts_prg_i=0x7c6ec369 "", schbat_i=0x0, ship_id_i=0x0, 
        ship_line_id_i=0x0, wkonum_i=0x0, wkorev_i=0x0, wkolin_i=0x0, 
        carcod_i=0x0, srvlvl_i=0x0, client_id_i=0x0, ordnum_i=0x0, stcust_i=0x0, 
        rtcust_i=0x0, ordlin_i=0x0, ordsln_i=0x0, concod_i=0x0, 
        segqty_i=0x40e2df10 "400", stoloc_i=0x40a43920 "", lodnum_i=0x0, 
        subnum_i=0x0, dstare_i=0x40e2dd30 "", dstloc_i=0x40e2a790 "FA31", 
        Picks=0x7fff9090, srcare_i=0x40e97af0 "", pcklvl_i=0x40e2a6f0 "", 
        pcksts_i=0x0, splflg_i=1, ovralcflg_i=0x0, untcas_i=5, untpak_i=1, 
        untpal_i=0, min_shelf_hrs_i=-1, frsflg_i=0, frsdte_i=0x7fff90a0 "", 
        trace_suffix_i=0x0, pipcod_i=0x40e8f090 "N", skip_invlkp_i=0, 
        alloc_loc_flg_i=0, wrkzon_i=0x0, wrkzon_req_i=0, aisle_id_i=0x0, 
        aisle_req_i=0, work_area_i=0x0, work_area_req_i=0) at varAllocInv.c:10703
    Here is the function where the core dump seems to be rooted:
    Code:
    static void sFreePckMovPath(PICK_MOVS * PckMovPaths)
    {
            PICK_MOVS          *pckmovs;
            PICK_MOV           *mov;
    
            if (!PckMovPaths)
            return;
    
            /* Go To Top of List */
            while (PckMovPaths->prev)
            PckMovPaths = PckMovPaths->prev;
    
            while (PckMovPaths)
            {
            pckmovs = PckMovPaths;
            if (pckmovs->pckMovPath)
            {
                    while (pckmovs->pckMovPath)
                    {
                    mov = pckmovs->pckMovPath;
                    pckmovs->pckMovPath = mov->next;
                    free(mov);
                    }
            }
            PckMovPaths = pckmovs->next;
            free(pckmovs);
            }
    
            return;
    }
    Where PICK_MOVS is a typedef that is a structure. I had couple of questions about this:

    1) Since the last thing the core dump shows is "while (PckMovPaths->prev)" does that mean it's actually core dumping on that line of code? Or somewhere after that?

    2) This file is 11,433 lines long and this pointer is being passed ALL OVER the place in here. As I said, it's not my code it's some old legacy stuff that is still in use. So, my suspision is that the pointer may have already been freed but was not set to NULL because the above function gets called like this:
    Code:
    if (PickFoundList)
            {
                    sFreePckMovPath(PickFoundList);
                    PickFoundList = NULL;
            }
    Is there anyway to check if a pointer has already been freed? I've never heard of such a test? Or does anyone have any suggestions to help me start digging through this massive code to find a memory issue?
    Last edited by Salem; 10-23-2008 at 09:29 AM. Reason: more tags

  2. #2
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    If it's legacy code, that has been working, and now isn't, the problem could be due to some environmental issue, where an interface has changed, or a system call that used to work isn't working now, or something like that. Therefore, I would look at all the places in your code that you are supposed to be checking for proper results and are not. Add those checks where there are missing, and you have a pretty good chance of finding the reason for the bug.

    Now, that's the reason for the bug.

    The bug may very well be using a pointer that is not valid in the first place, as opposed to some section of code that, for instance, tries to free a block of memory twice.

    Aside from that, this code could be suspect:
    Code:
    mov = pckmovs->pckMovPath;
    pckmovs->pckMovPath = mov->next;
    You are picking up "mov" and not testing it to be a valid pointer.
    Mainframe assembler programmer by trade. C coder when I can.

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > while (PckMovPaths->prev)
    What's the betting that your previous implementation of malloc cleared memory, saving you the "inconvenience" of having to do node->prev = NULL at some point in the code.

    Of course, now memory is filled with junk (27272020 seems like a pretty good pattern).

    On the other hand, it's also the string "'' ", though unlikely, could indicate a buffer overrun of some sort. Again, the amount of slack space between adjacent memory blocks for different malloc implementations could easily result in this kind of problem remaining hidden for a long time.


    > 1) Since the last thing the core dump shows is "while (PckMovPaths->prev)" does that mean it's actually core dumping on that line of code? Or somewhere after that?
    Nope, that's where it is in this instance.

    A couple of things to try.
    1. valgrind runs your program in a kind of VM which validates all memory accesses. A bit on the slow side, but it will pinpoint all the dodgy memory accesses.

    2. Electric fence does something similar, but you need to relink the code with say
    gcc prog.c -lefence

    Run the result in the debugger and it will trap on the instruction which steps out of bounds.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    If this is a port from 32 to 64-bit, perhaps there is parts of the code that assumes that pointers are 32-bit? Most of these cases would be detected as warnings in the compiler, but it's entirely possible, with some code-patterns, to fool the compiler into a situation where it doesn't detect such things.

    Code:
    vartrnAllocateInventory (
        pcktyp_i=0x40654700 "TOPOFF-REPLEN", prtnum_i=0x40e2c090 "64871", 
        prt_client_id_i=0x40e22f10 "MDC", pckqty_i=400, 
        orgcod_i=0x40e8e690 "----", revlvl_i=0x40e2a190 "----", 
        lotnum_i=0x40e81ab0 "----", invsts_i=0x40d44270 "A", 
        invsts_prg_i=0x7c6ec369 "", schbat_i=0x0, ship_id_i=0x0, 
        ship_line_id_i=0x0, wkonum_i=0x0, wkorev_i=0x0, wkolin_i=0x0, 
        carcod_i=0x0, srvlvl_i=0x0, client_id_i=0x0, ordnum_i=0x0, stcust_i=0x0, 
        rtcust_i=0x0, ordlin_i=0x0, ordsln_i=0x0, concod_i=0x0, 
        segqty_i=0x40e2df10 "400", stoloc_i=0x40a43920 "", lodnum_i=0x0, 
        subnum_i=0x0, dstare_i=0x40e2dd30 "", dstloc_i=0x40e2a790 "FA31", 
        Picks=0x7fff9090, srcare_i=0x40e97af0 "", pcklvl_i=0x40e2a6f0 "", 
        pcksts_i=0x0, splflg_i=1, ovralcflg_i=0x0, untcas_i=5, untpak_i=1, 
        untpal_i=0, min_shelf_hrs_i=-1, frsflg_i=0, frsdte_i=0x7fff90a0 "", 
        trace_suffix_i=0x0, pipcod_i=0x40e8f090 "N", skip_invlkp_i=0, 
        alloc_loc_flg_i=0, wrkzon_i=0x0, wrkzon_req_i=0, aisle_id_i=0x0, 
        aisle_req_i=0, work_area_i=0x0, work_area_req_i=0)
    How many paremeters to that function? Ever heard of putting the values in a struct and passing the address of the struct? If you didn't write this yourself, perhaps you can forward this suggestion to whoever did...

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Registered User
    Join Date
    Oct 2008
    Posts
    11
    Well the code is running on a HP-UX ia64 machine so it's definitely running on a 64 bit box and I didn't write this code at all so I'm not sure if it was originally written for 32 or 64 bit. This is some old stuff written probably upwards of 8-10 years ago that has been tweaked here or there.

    They do use structs to hold a lot of stuff; pckmov and mov are both structs but I don't really have a choice to go back and rewrite it all as this is part of our base code. Unfortunately I'm stuck trying to debug the way it's written now and just try to fix this issue.

    It doesn't core dump every time it's executed though just probably 4-8 times a week when it's ran upwards of millions of times during a given week.

  6. #6
    Registered User
    Join Date
    Oct 2008
    Posts
    11
    Maybe I exagerated there just a bit haha! It's probably not ran millions of times but it is certainly ran thousands.

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Then my guess would be that certain input patterns cause a problem - if you run the code with the same parameters, does it crash every time?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    My first suggestion to check for a pointer was worthless after reading your code more.

    However, the bug could be that PckMovPaths->prev was never initialized properly for the first element (or any other element for that matter).
    Mainframe assembler programmer by trade. C coder when I can.

  9. #9
    Registered User
    Join Date
    Oct 2008
    Posts
    11
    Thanks guys! I finally figured out how to reproduce the issue (I've been trying for 2 days) and I'll post back when I can find the root cause.

  10. #10
    Registered User
    Join Date
    Oct 2008
    Posts
    11
    So this ended up being a combination of problems....

    PckMovPaths is part of a linked list of structures that is inside of another structure. In a very specific situation the outer most structure was being initialized and had memory allocated for it but then the process is stopped if some specific data doesn't match. The memory that the outter most structure is pointing too gets freed but the pointer is not set back to NULL. Then, the next time through since the pointer <> NULL the code assumes it needs to be freed and therefore sFreePckMovPath() gets called without every having PckMovPaths->prev allocated or initialized.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. tools for finding memory leaks
    By stanlvw in forum C++ Programming
    Replies: 4
    Last Post: 04-03-2009, 11:41 AM
  2. Memory problem with Borland C 3.1
    By AZ1699 in forum C Programming
    Replies: 16
    Last Post: 11-16-2007, 11:22 AM
  3. Managing shared memory lookups
    By clancyPC in forum Linux Programming
    Replies: 0
    Last Post: 10-08-2003, 04:44 AM
  4. finding a program in memory
    By ... in forum C++ Programming
    Replies: 7
    Last Post: 09-19-2003, 09:29 AM
  5. Memory issue...
    By Sebastiani in forum Windows Programming
    Replies: 3
    Last Post: 06-22-2002, 09:59 AM