Thread: malloc invalid address alignment

  1. #1
    Registered User
    Join Date
    May 2004
    Posts
    8

    malloc invalid address alignment

    I have some old code that has out grown 32-bit address space so I'm converting it to run as a 64-bit binary instead. It compiles fine, but when I run it I get a core dump. I've traced it down to the following piece of code, but I can't figure out why it's broken, or how to fix it. The code is run on sol2.8 machines and I appear to get the same result with either SUNspro/cc or gcc.

    I'm not much of a programmer, so I debugged it down to the malloc line below with printf statements:

    Code:
        TestCellData *testCellData;
        CellData *cellData;
        int c;
        long *cellWord;
    
        if ((cellData = malloc(sizeof(CellData))) == NULL) {
            Fatal("out of memory (line %s:%d)\n", __FILE__, __LINE__);
        }
    CellData is defined in a header file:

    Code:
    #define CELL_DATA_BYTES 64
    typedef unsigned char CellData[CELL_DATA_BYTES];
    Then in a final gasp I tried my hand at dbx and it seems to yield a bit more information:

    Code:
    Current function is CreateCellData
       40       if ((cellData = malloc(sizeof(CellData))) == NULL) {
    signal BUS (invalid address alignment) in _smalloc at 0xffffffff7ea4928c
    0xffffffff7ea4928c: _smalloc+0x0094:    ldx     [%o0 + 0x10], %g4
    So, can someone explain to me why aren't the new 64-bit addresses aligned? I tried casting malloc a few ways, but nothing seemed to help. Note that this same line works fine several times before the core dump. Perhaps the code is fine and I have an environment issue (I assume _smalloc is in some lib*.so file somewhere)?

    Any suggestions would be greatly appreciated.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    Step 1 is to simply try a simple program containing just that single call to malloc, and enough declarations to make it work.

    If that succeeds (as it should), then the problem is most likely elsewhere in your code. Most likely in your use (or mis-use) of some previously allocated memory.

    Corruption of malloc memory pool basically means that at any point forward from that (perhaps never), you will get a failure in a non-related (apparently) call to malloc or free.
    One of the ways you get to find this out is by porting your application to another OS/Compiler.

    I would try running your program compiled on the 32 bit system (where it basically works) compiled with a malloc debugger such as electric fence. That should tell you if you're overstepping any malloc'ed memory, potentially causing problems later on.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    May 2004
    Posts
    8
    Step 1 is to simply try a simple program containing just that single call to malloc, and enough declarations to make it work.
    I don't think I was clear, this code has been used for years in 32-bit form without problems. Only when compiled into 64-bit form is the alignment an issue. Additionally, this exact same malloc line successfully executes a half dozen times or so before the non-aligned address is returned.

    Perhaps that will help jog something for someone...
    Last edited by code2big; 05-22-2004 at 03:47 PM. Reason: didn't proof read

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    Do you equate "working" with "bug-free" ?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Hate to ask the obvious, but have you included <stdlib.h>?

  6. #6
    Registered User
    Join Date
    May 2004
    Posts
    8
    Yes, it's in there and several others:

    Code:
    #include <assert.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <malloc.h>
    #include <string.h>
    #include <math.h>
    Thanks for asking.

  7. #7
    Registered User pinko_liberal's Avatar
    Join Date
    Oct 2001
    Posts
    284
    Code:
    cellData = malloc(sizeof(CellData))
    I don't what the difference between CellData and cellData is, but is it equivalent to
    Code:
    cellData = malloc(sizeof(*cellData))
    which is what I guess you are trying to do?
    The one who says it cannot be done should never interrupt the one who is doing it.

  8. #8
    Registered User pinko_liberal's Avatar
    Join Date
    Oct 2001
    Posts
    284
    Sorry, I missed the defns.
    Cant see anything you are doing wrong.
    The one who says it cannot be done should never interrupt the one who is doing it.

  9. #9
    Registered User
    Join Date
    Oct 2001
    Posts
    2,934
    >Additionally, this exact same malloc line successfully executes a half dozen times or so before
    There's a good clue.

  10. #10
    Obsessed with C chrismiceli's Avatar
    Join Date
    Jan 2003
    Posts
    501
    why are you including malloc.h?, it is defined in stdlib.h
    Help populate a c/c++ help irc channel
    server: irc://irc.efnet.net
    channel: #c

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    To code2big

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    int main ( ) {
        int *p = malloc( 10 * sizeof *p );
        int i;
        for ( i = 0 ; i <= 10 ; i++ ) {
            p[i] = 0;
        }
        free( p );
        p = malloc( 10 );
        for ( i = 0 ; i < 10 ; i++ ) {
            p[i] = 0;
        }
        free( p );
        return 0;
    }
    Do you know that the above code has a bug (two actually)?
    And I'm not talking about the two missing p == NULL tests either.

    Will it cause a crash?

    If I add a third call to malloc, and that call crashes, where will the problem be?
    a) in the 3rd call to malloc
    b) in the abuse of memory following the first two calls to malloc

    Does knowing that it doesn't crash on my machine mean it won't crash on your machine?

    > cellData = malloc(sizeof(CellData))
    > I don't what the difference between CellData and cellData is, but is it equivalent to
    > cellData = malloc(sizeof(*cellData))
    OMG, I thought I'd missed the oldest trick in the book
    Code:
        if ((cellData = malloc(sizeof(CellData))) == NULL) {
        }
        if ((cellData = malloc(sizeof(cellData))) == NULL) {
        }
    The first is OK, but the 2nd one is hopelessly wrong - yet both compile.

    Having two objects with identical spelling separated only by the case of a single letter
    cellData vs. CellData
    is a really bad idea.

    From what you're saying, this is a large program potentially written by many people.
    What are the chances that one forgotten (and very hard to spot) case change isn't still in your program?

    > this code has been used for years in 32-bit form without problems
    Do you know what this means?
    It means people stopped looking for bugs when it stopped crashing, not when it was bug free. This is pretty much true of all debugging.
    It's only when its ported to a different machine that previously hidden problems are flushed out.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Registered User
    Join Date
    May 2004
    Posts
    8
    Quote Originally Posted by swoopy
    >Additionally, this exact same malloc line successfully executes a half dozen times or so before
    There's a good clue.
    Ya, I agree. This seems to be telling me something, but I'm not clear on exactly what. In general, this seems to confirm some of the comments that at least the commands that execute just before the crash are not miss coded (for lack of a better term).

    Care to speculate further?

  13. #13
    Registered User
    Join Date
    May 2004
    Posts
    8
    Quote Originally Posted by chrismiceli
    why are you including malloc.h?, it is defined in stdlib.h
    I could speculate, but I just don't know. I took a look at malloc.h and I didn't see any obvious reason for it, so I took it out. The results were the same as before, so it seems it was erroneously included. However, it doesn't seem to have been problematic either.

    Anything else appear out of place to you?

    Thanks.

  14. #14
    Registered User
    Join Date
    May 2004
    Posts
    8
    Salem,
    Quote Originally Posted by Salem
    Will it cause a crash?

    If I add a third call to malloc, and that call crashes, where will the problem be?
    a) in the 3rd call to malloc
    b) in the abuse of memory following the first two calls to malloc

    Does knowing that it doesn't crash on my machine mean it won't crash on your machine?
    Agreed, memory corruption issues are amongst the most painful problems to isolate and correct.

    Quote Originally Posted by Salem
    OMG, I thought I'd missed the oldest trick in the book
    Code:
        if ((cellData = malloc(sizeof(CellData))) == NULL) {
        }
        if ((cellData = malloc(sizeof(cellData))) == NULL) {
        }
    The first is OK, but the 2nd one is hopelessly wrong - yet both compile.
    True, this would be pathologic if in some cases both did the same thing too. That doesn't seem likely in this particular case though, since CellData is an array of 64 char, so sizeof(CellData) returns 64 (for both 32 or 64 bit solaris I think). sizeof(cellData) seems to return address width (4 or 8 resp).

    A quick comment on the code size: project probably has written by 3 or 4 people, but I really have no idea. All the source code is kept in a single directory and a quick "wc *[ch]" shows a little over 15K total lines. Given that, I grep'ed out the malloc commands (there is only about 2 dozen) and did a close visual inspection for these potential letter case problems. Similarly, I wanted to inspect that the each one was similar in format to the later (presumably correct) cellData expression above. They seem OK.

    Quote Originally Posted by Salem
    Having two objects with identical spelling separated only by the case of a single letter
    cellData vs. CellData
    is a really bad idea.
    Yes, you are certainly right here. It is quite difficult to read at times. I don't know exactly why it was done this way, but it was apparently some type of naming convention, since it's done pretty consistently throughout the code. It appears to match naming conventions used in other (non c/c++) hardware design tools more central to the overall project.

    Anyway, thanks for looking at this. 64-bit logic tools (as well as Sun's 64-bit env) are just now coming to maturity (at least in our design env anyway) lately. I've had to integrate several 64-bit mechanisms when other projects out grew 32, but this is a little different than the previous cases. There are a lot more files in this case, but mostly the interface to this c code is a tcl interpreter. Thus before I could even get started, I needed to compile 64-bit tcl. Tcl8.0 was used in the original project, but unfortunately tcl8.0 configure file doesn't support 64-bit solaris.

    I got it made, but I had to hack the Makefile a good bit. Now I find it suspect due to my 64-bit env at the time. I've went through all of the env stuff pretty thoroughly now, and I'm going to go back and thoroughly test (and probably rebuild with the improved env) my tcl installation. If it's OK, I think I will have eliminated all of the other potential causes save memory corruption, gag.

  15. #15
    Registered User
    Join Date
    May 2004
    Posts
    8
    It turns out that my environment and my tcl8.0 compile were fine. There was an attempt in the c code to fill a 64 byte memory block one with one random word at a time. That is 16 random "long" operands are stuffed in using (long *), so when long went from 4 bytes to 8 in 64-bit mode, 128 bytes were written to memory and not the 64 that were intended.

    On to the next problem. Thanks again all.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 10
    Last Post: 09-04-2008, 01:27 PM
  2. DX - CreateDevice - D3DERR_INVALIDCALL
    By Tonto in forum Game Programming
    Replies: 3
    Last Post: 12-01-2006, 07:17 PM
  3. Is there a limit on the number of malloc calls ?
    By krissy in forum Windows Programming
    Replies: 3
    Last Post: 03-19-2006, 12:26 PM
  4. Flood of errors when include .h
    By erik2004 in forum C++ Programming
    Replies: 14
    Last Post: 12-07-2002, 07:37 AM
  5. doubly linked lists
    By cworld in forum C++ Programming
    Replies: 2
    Last Post: 04-21-2002, 09:33 AM