Thread: sizeof applied to structure...

  1. #1
    Registered User
    Join Date
    Jun 2009
    Location
    US of A
    Posts
    305

    sizeof applied to structure...

    Just a question when i have a structure declaration like

    [insert]
    Code:
    struct node
    {
    	int data;
    	//struct node *link;
    	char ch;
    };
    and i use sizeof(struct node) the sizeof returns 8 though it should have been 5. Though the reason mentioned is

    "The reason for this is that most compilers, by default, align complex data-structures to a word alignment boundary" i am unable to get what is meant by word alignment boundary.

  2. #2
    Registered User Cooloorful's Avatar
    Join Date
    Feb 2009
    Posts
    59
    A word is 2 bytes. 32-bit systems more often align to a double word (4 bytes). #pragma pack() is pretty common place for overriding the default boundaries. Though I would only do that when you know its what you should be doing. Unions are also capable (and a more portable solution) of changing the alignment of a structure.

    Code:
    union node
    {
      struct
      {
        int data;
        char c;
      };
    
      char raw[5];
    };
    Last edited by Cooloorful; 07-29-2009 at 03:10 PM.
    wipe on -
    A slap on the hand is better than a slap on the face. A tragic lesson learned far too late in life.
    - wipe off

  3. #3
    Registered User
    Join Date
    Jun 2009
    Location
    US of A
    Posts
    305
    So is it something that to maintain proper allotment of memory in multiples of 4, the compiler allocates 4 bytes for char as well.

    But then if i were to have a single char like

    char c;
    and then use sizeof(char) it would return 1.

    Why does it not return 4 even here? Is word alignment not an issue here.

  4. #4
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Word alignment means that the structure's members are aligned on 4 byte boundaries.

    To illustrate this better, look at this:

    Code:
    struct s1
    {
        char c1; 
        char c2; 
        char c3; 
        char c4; 
        int i1; 
        int i2; 
        int i3; 
        int i4; 
    };
    
    struct s2
    {
        char c1; 
        int i1; 
        char c2; 
        int i2; 
        char c3; 
        int i3; 
        char c4; 
        int i4; 
    };
    Even though both s1 and s2 hold the same size variables, s1 will be 20 bytes, and s2 will be 32 bytes. This is because in s1, all 4 characters can fit into a single word boundary. This means the 4 chars take up just 4 bytes. In s2 each char will take up 4 bytes because the int that is declared after each one needs to be word aligned.

  5. #5
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by Cooloorful View Post
    #pragma pack() is pretty common place for overriding the default boundaries.
    Careful. That's a GCC-specific pragma.
    All pragmas are compiler-specific and there's no mention of what compiler the OP is using.
    Also, don't mess with the padding if you don't have a good reason to. The compiler does it for a reason (performance).
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  6. #6
    Registered User
    Join Date
    Jun 2009
    Location
    US of A
    Posts
    305
    Quote Originally Posted by bithub View Post
    Word alignment means that the structure's members are aligned on 4 byte boundaries.

    To illustrate this better, look at this:

    Code:
    struct s1
    {
        char c1; 
        char c2; 
        char c3; 
        char c4; 
        int i1; 
        int i2; 
        int i3; 
        int i4; 
    };
    
    struct s2
    {
        char c1; 
        int i1; 
        char c2; 
        int i2; 
        char c3; 
        int i3; 
        char c4; 
        int i4; 
    };
    Even though both s1 and s2 hold the same size variables, s1 will be 20 bytes, and s2 will be 32 bytes. This is because in s1, all 4 characters can fit into a single word boundary. This means the 4 chars take up just 4 bytes. In s2 each char will take up 4 bytes because the int that is declared after each one needs to be word aligned.

    That makes a lot clearer and sense as well. So the order is equally important in declaration of structures. Strange though :-)

  7. #7
    Registered User
    Join Date
    Jun 2009
    Location
    US of A
    Posts
    305
    Quote Originally Posted by Cooloorful View Post
    A word is 2 bytes. 32-bit systems more often align to a double word (4 bytes). #pragma pack() is pretty common place for overriding the default boundaries. Though I would only do that when you know its what you should be doing. Unions are also capable (and a more portable solution) of changing the alignment of a structure.

    Code:
    union node
    {
      struct
      {
        int data;
        char c;
      };
    
      char raw[5];
    };
    # pragma pack(). I am hearing this for the first time ?????

  8. #8
    Registered User Cooloorful's Avatar
    Join Date
    Feb 2009
    Posts
    59
    *sigh* I love how when this topic clicks, all is well in the universe. If kids learned assembler first nowadays I think perhaps it would be more intuitive.

    A char is one byte.

    Code:
    void function1(void)
    {
      int c;
      char d;
      float h;
    
      printf("c is %d bytes\nb is %d bytes\nh is %d bytes\n", sizeof c, sizeof d, sizeof h);
      printf("The distance from c to h is %d bytes", (char *)(&h + 1) - (char *)(&c));
    }
    The problem with data being aligned on weird boundaries is that it makes the stack run less efficiently since it keeps needing to realign the stack to different boundaries.

    Code:
    void function2(void)
    {
      struct {
        int c;
        char d;
        float h;
      } t;
    
      printf("t.c is %d bytes\nt.b is %d bytes\nt.h is %d bytes\nt is %d bytes\n", sizeof t.c, sizeof t.d, sizeof t.h, sizeof t);
      printf("The distance from t.c to t.h is %d bytes", (char *)(&t.h + 1) - (char *)(&t.c));
    }
    The structure aligns data more efficiently. Its an invisible characteristic that won't effect your code at all. It does become an issue when parsing binary files, however.
    wipe on -
    A slap on the hand is better than a slap on the face. A tragic lesson learned far too late in life.
    - wipe off

  9. #9
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by Cooloorful View Post
    *sigh* I love how when this topic clicks, all is well in the universe. If kids learned assembler first nowadays I think perhaps it would be more intuitive.
    And go through the horrors of writing assembly code? No thanks.
    I think they should be content with the fact that the compiler can pad structs. For performance reasons.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  10. #10
    Registered User Cooloorful's Avatar
    Join Date
    Feb 2009
    Posts
    59
    Quote Originally Posted by roaan View Post
    # pragma pack(). I am hearing this for the first time ?????
    Again, a more platform independant way to accomplish the task is to use a union. But since you are asking, #pragma directives are simply directives only specific compilers even listen to. They are not standardized at all. The only thing standard about them is they all have to begin with #pragma. pack is in the gcc and MSVC family of directives (though the syntax can vary) which simply tells the compiler to align it how you are telling it to, not how it wants to.

    Example:
    Code:
    #pragma pack(1)
    
    struct node
    {
      struct node *next;
      char ch;
    };
    Now when you do your sizeof it should be 5.
    wipe on -
    A slap on the hand is better than a slap on the face. A tragic lesson learned far too late in life.
    - wipe off

  11. #11
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    # pragma pack(). I am hearing this for the first time ?????
    Not all compilers support this, and there is almost never a time where using it is a good idea.

  12. #12
    Registered User Cooloorful's Avatar
    Join Date
    Feb 2009
    Posts
    59
    Quote Originally Posted by Elysia View Post
    And go through the horrors of writing assembly code? No thanks.
    I think they should be content with the fact that the compiler can pad structs. For performance reasons.
    While I do not disagree with that, I think understanding why it does so is more a matter of fact in a high level language such as C/C++ whereas in assembler its very intuitive and obvious.
    wipe on -
    A slap on the hand is better than a slap on the face. A tragic lesson learned far too late in life.
    - wipe off

  13. #13
    Registered User Cooloorful's Avatar
    Join Date
    Feb 2009
    Posts
    59
    Quote Originally Posted by bithub View Post
    Not all compilers support this, and there is almost never a time where using it is a good idea.
    Agreed! But whether or not you should use it, you are now informed that it does exist and how to use it.
    wipe on -
    A slap on the hand is better than a slap on the face. A tragic lesson learned far too late in life.
    - wipe off

  14. #14
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by Cooloorful View Post
    While I do not disagree with that, I think understanding why it does so is more a matter of fact in a high level language such as C/C++ whereas in assembler its very intuitive and obvious.
    Ah, but the question is, do I really have to know?
    Assembler is not a very friendly language, after all, and being as much machine-independent as possible makes for very portable code.

    I know people tend to ask stupid questions sometimes. Perhaps they should attend to a hardware engineering course of some sort. But I don't think using assembler before is a good thing.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  15. #15
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    Code:
    The structure aligns data more efficiently.
    Huh? No. Having your data in a struct does not magically bestow good karma onto it. Here's the graphical layout of your stack & structure examples, on my machine:
    Code:
    Stack:
    +-------+-----+-+-------+
    | h     | ?   |d| c     |
    +-------+-----+-+-------+
     0 1 2 3 4 5 6 7 8 9 a b
    
    Struct:
    +-------+-+-----+-------+
    | c     |d| ?   | h     |
    +-------+-+-----+-------+
     0 1 2 3 4 5 6 7 8 9 a b
    Furthermore, I get different answers for the stack one, depending on whether or not I actually use those variables. And this is with optimizations off.
    As for packing, there's usually not a use for that (I've never needed it.). Write portable serialization code, and let your compiler have free reign over where your variables are.
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Problem referencing structure elements by pointer
    By trillianjedi in forum C Programming
    Replies: 19
    Last Post: 06-13-2008, 05:46 PM
  2. Replies: 14
    Last Post: 06-28-2006, 01:58 AM
  3. finding size of empty char array
    By darsunt in forum C Programming
    Replies: 12
    Last Post: 05-30-2006, 07:23 PM
  4. Dikumud
    By maxorator in forum C++ Programming
    Replies: 1
    Last Post: 10-01-2005, 06:39 AM
  5. Serial Communications in C
    By ExDigit in forum Windows Programming
    Replies: 7
    Last Post: 01-09-2002, 10:52 AM