Thread: unions, alignment, int pointers, wow!

  1. #1
    Registered User
    Join Date
    Apr 2011
    Posts
    8

    unions, alignment, int pointers, wow!

    Following my first post about alignment of int pointers (April 4th), now I find that unions work ok. Let me summarize:

    I lately discovered that in some compilers/hardware you cannot cast a char pointer into an int pointer if you donīt pay attention to alignment. The following will not work (at least everywhere):

    char Message[10] = { 0x7F, 0x00, 0x00, 0x00, 0x01 /*, ...*/ };
    ...
    int addr = *(int *)&Message[1] ;

    What I discovered now is that the following DO work, to my surprise, where the latter donīt:

    struct meaning {
    char gap ;
    int address ;
    char remaining[5] ;
    } ;

    union {
    char message[10] ;
    struct meaning data ;
    } um ;

    ... /* copy into message */
    memcpy (um.message, Message, 10) ;

    int addr = um.data.address ;

    Could anybody explain to me why? Why cannot I cast a pointer to unaligned int, but YES can I map into a union and access to the int part even not aligned?
    I was told not to expect good results in non-aligned pointers, that the standard says "undefined behaviour". Is it explicitly indicated in the standard, when you refer to unions, that the compiler has to solve even in non-aligned situations?
    Many thanks...
    ...marcelo

  2. #2
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Check out the following section from the Wikipedia article: Data structure alignment - Wikipedia, the free encyclopedia. It explains the struct alignment pretty well. As for unions, IIRC, it is aligned according to the biggest type in there. In your case, the struct (because of padding between gap and address members) forces the union to be aligned on an appropriate boundary, probably something that's a multiple of 4. In a union, all members start at the same address and are overlapping, so message ends up being aligned the same as data.

  3. #3
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    A union is a type of struct. But instead of creating variables beside each other it creates them on top of eachother. The result is a block of memory the size of the largest element in the struct. Since the various elements are piled on top of eachother (i.e. occupying the same memory space) putting something in one means you get something out of the other. Alignment is a result of the way the union is sized by the compiler.

    In fact that is one of the good uses for unions...

    Code:
    union t_Pile
      { unsigned char bytes[sizeof(int)];
         unsigned int   integer;
       } Pile;
    
    Pile.integer = 1000;
    
    unsigned char x = Pile.bytes[1]; // retrieve the second byte from the integer.

  4. #4
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    The short answer is that you are actually doing the following:
    Code:
    struct meaning {
    char gap ;
    char padding[3]; /* inserted by the compiler so that next element starts at multiple of 4 or 8 whatever is native machine */
    int address ;
    char remaining[5] ;
    } ;
    This is for illustration purposes only.

  5. #5
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    You've got it wrong. Unions don't magically fix alignment concerns. Structs do though.

    All members of a struct are required to be addressable - which, if there is a problem with alignment of types, means the compiler has to introduce padding between struct members.

    Using such a struct as a member of a union does not change that.

    If it's happening in your case, one side effect is that sizeof(struct meaning), and therefore sizeof your union, exceeds 10 - there will be padding between gap and address in your struct meaning
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    I think you need to check the alignment of each member in your struct.
    Code:
    $ cat bar.c
    #include <stdio.h>
    #include <stddef.h>
    #include <stdlib.h>
    
    typedef struct meaning {
    char gap ;
    int address ;
    char remaining[5] ;
    }meaning;
    
    int main ( ) {
      printf("Size=%zd\n", sizeof(meaning));
      printf("Offset=%zd\n", offsetof(meaning,address));
      return 0;
    }
    $ gcc bar.c
    $ ./a.out 
    Size=16
    Offset=4
    On most machines, your struct is neither 10 bytes long, nor is your integer at offset 1.

    To pack as you want, you typically need something like a pragma (or attributes in gcc)
    Code:
    $ cat bar.c
    #include <stdio.h>
    #include <stddef.h>
    #include <stdlib.h>
    
    #pragma pack(1)
    typedef struct meaning {
    char gap ;
    int address ;
    char remaining[5] ;
    }meaning;
    
    int main ( ) {
      printf("Size=%zd\n", sizeof(meaning));
      printf("Offset=%zd\n", offsetof(meaning,address));
      return 0;
    }
    $ gcc bar.c
    $ ./a.out 
    Size=10
    Offset=1
    Whilst this gets the int to the correct place, you'll find that any code which accesses that int will be "slugged" to take into account it's unaligned position in memory.

    The most reliable method of solving your problem is something like
    Code:
    myInt = (int)buff[1] << 24 | (int)buff[2] << 16 | (int)buff[3] << 8 | (int)buff[4];
    Which works regardless of endian issues on the host machine, so long as you know the endianess of the input message.

    If you know the endian is the same, you can short-cut to
    Code:
    memcpy( &myInt, &buff[1], sizeof(myInt) );
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Why not print out the address of um.data.address to see if it is aligned on its natural boundary.
    As noted, struct members have internal padding to align each type on its natural boundary, and
    Tail padding to make the struct a multiple of its most restrictive type which in this case is an int.

  8. #8
    Registered User
    Join Date
    Apr 2011
    Posts
    8

    no padding

    No, I can assure no padding is added. You see, I have a char vector that memcpy to the char vector part of the union, and then I can access to the second-third-fourth-fifth bytes with the second part of the union, where I see it as an integer. Marvellous. But I donīt understand, because the compiler/hardware donīt let me do it if I use a pointer. So, as I can understand, I am really accessing an int not aligned, with the trick of using a union. Does it make sense?

  9. #9
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Quote Originally Posted by mhrodri View Post
    No, I can assure no padding is added.
    The compiler pads the struct behind-the-scenes, not the programmer.
    Quote Originally Posted by mhrodri View Post
    You see, I have a char vector that memcpy to the char vector part of the union, and then I can access to the second-third-fourth-fifth bytes with the second part of the union, where I see it as an integer. Marvellous.
    Yep! because union members share the same storage, so memcpy() can fill um using char Message[] as source.
    Quote Originally Posted by mhrodri View Post
    But I donīt understand, because the compiler/hardware donīt let me do it if I use a pointer. So, as I can understand, I am really accessing an int not aligned, with the trick of using a union. Does it make sense?
    Nope! you're mistaken, because the int is aligned. The previously defined struct takes care of that.
    Print out the address in both cases and you'll see that one is nonaligned while the union one isn't.
    Last edited by itCbitC; 04-12-2011 at 03:05 PM.

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Without you saying which compiler you're using, and taking your word that the size of your structure is 10 bytes, then the compiler may (previously, I said will) be generating special code to access the unaligned integer.

    If you're on an x86, then it is a somewhat unusual processor in that it does allow unaligned accesses. Doing a bit of digging around, the earlier models (before the 386), there was a performance hit of a few clock cycles for accessing unaligned words. The modern processors all seem to manage it in one cycle anyway.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  11. #11
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    Some micro-controllers may not allow unaligned multi-byte access. The addressing just isn't set up to allow odd byte address. In that case the compiler that's intended for that platform is right to issue severe error messages if attempted because it won't be able to generate correct machine code.

  12. #12
    Registered User
    Join Date
    Apr 2011
    Posts
    8

    more...

    Well, I can confirm the following, after more detailed revision. Let me rewrite the example:
    char original_data[10] = { 0x7F, 0x00, 0x00, 0x00, 0x01, 0 } ;

    The second-third-..-fourth byte defines an integer number. The following will NOT work:
    int addr = *(int *)original_data[1] ;

    but the following YES, it works:
    union {
    char bytes[10] ;
    struct {
    char header ;
    int address ;
    } interpret ;
    } u_data ;
    memcpy (u_data.bytes, original_data, 10) ;
    addr = u_data.interpret.address ;

    In this latter code you can access no-aligned int. I have looked at the assembler generated by the compiler, and I see the difference. I can figure out that the compiler correctly access the int because I saw shifts, and two accesses.
    So, sadly, the lazzy compiler does not make any effort when dealing with pointers, but in unions it really accesses what we expect. Why would it be? Perhaps a strict stick to the standard?
    Thank you everyone for your time.

  13. #13
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    It might actually help if you told us which compiler you're using.

    FWIW, some embedded compilers may favour compactness of data structures at the expense of access speed. There should (may) be an option to modify that behaviour.

    > So, sadly, the lazzy compiler does not make any effort when dealing with pointers
    Why should it kill the performance of every pointer access, just on the off-chance it may be unaligned?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  14. #14
    Registered User
    Join Date
    Apr 2011
    Posts
    8
    avr-gcc, the compiler

    ok, thank you

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    I had a look through my local gcc manual for the variable attributes, hoping to be able to do something like this.
    But it would seem to be not the right thing to do.

    Code:
    int *mySpecialPointer __attribute__ ((aligned (1)));
    mySpecialPointer = (int*)&original_data[1] ;
    int addr = *mySpecialPointer;
    The general idea being to signal to the compiler that this particular int pointer is a bit special, and generate more conservative code when it comes to dereferencing it.


    You can do a lot of weird stuff with attributes, so it might be worth checking out.

    It might be worth checking to see if your local compiler (tuned for AVR specifically) has such a feature.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Unions of the same size and pointers
    By holychicken in forum C Programming
    Replies: 9
    Last Post: 10-06-2008, 03:29 PM
  2. Unions and void pointers
    By dwks in forum C Programming
    Replies: 11
    Last Post: 09-05-2007, 11:59 AM
  3. alignment
    By Frank_Rye in forum C Programming
    Replies: 9
    Last Post: 10-28-2005, 03:18 PM
  4. Replies: 7
    Last Post: 12-29-2001, 11:25 PM
  5. Alignment
    By Unregistered in forum C Programming
    Replies: 3
    Last Post: 10-18-2001, 05:07 PM