Thread: sizeof applied to structure...

  1. #16
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    Quote Originally Posted by roaan View Post
    Just a question when i have a structure declaration like

    [insert]
    Code:
    struct node
    {
    	int data;
    	//struct node *link;
    	char ch;
    };
    and i use sizeof(struct node) the sizeof returns 8 though it should have been 5. Though the reason mentioned is

    "The reason for this is that most compilers, by default, align complex data-structures to a word alignment boundary" i am unable to get what is meant by word alignment boundary.
    Quote Originally Posted by N1256 3.2#1
    alignment
    requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address
    Different data types have different alignment requirements. Most of the time, these alignment requirements are dictated by the architecture the program is run on, however, the compiler could conceivably have some extra functionality that enabled the effect of unaligned accesses, by, for instance, making multiple reads/writes and assembling the parts itself. On some platforms, like x86, unaligned accesses are slower than aligned accesses, but they work. On other systems, it makes the program crash. So, naturally, the compiler would want to lay out the members in a struct so that when they were accessed, the accesses would be aligned.

    In the case of your struct, it's likely that an int is 4 bytes and is aligned to a 4-byte boundary. The char would be 1 byte and aligned to a 1-byte boundary. Thus, the char can be located at any byte address, but aligned ints can only begin at every 4 byte addresses.

    The compiler has to account for the possibility of the struct being an element of an array. Since arrays have to be contiguous, the compiler has to add padding somewhere in the struct in order for array[1] and later elements' int member to be properly aligned. This means an extra 3 bytes of padding. This could be inserted either betwen the int and char members, or the char member and the end, but not before the first member. This explains why the struct has size 8.
    Quote Originally Posted by Cooloorful View Post
    Unions are also capable (and a more portable solution) of changing the alignment of a structure.

    Code:
    union node
    {
      struct
      {
        int data;
        char c;
      };
    
      char raw[5];
    };
    Putting a struct in a union with a char array would typically do nothing to affect alignment.
    Quote Originally Posted by roaan View Post
    So is it something that to maintain proper allotment of memory in multiples of 4, the compiler allocates 4 bytes for char as well.

    But then if i were to have a single char like

    char c;
    and then use sizeof(char) it would return 1.

    Why does it not return 4 even here? Is word alignment not an issue here.
    The extra 3 bytes is not assigned to the char, but to the struct in the form of padding.
    Quote Originally Posted by Cooloorful View Post
    Code:
    void function1(void)
    {
      int c;
      char d;
      float h;
    
      printf("c is %d bytes\nb is %d bytes\nh is %d bytes\n", sizeof c, sizeof d, sizeof h);
      printf("The distance from c to h is %d bytes", (char *)(&h + 1) - (char *)(&c));
    }
    The problem with data being aligned on weird boundaries is that it makes the stack run less efficiently since it keeps needing to realign the stack to different boundaries.

    Code:
    void function2(void)
    {
      struct {
        int c;
        char d;
        float h;
      } t;
    
      printf("t.c is %d bytes\nt.b is %d bytes\nt.h is %d bytes\nt is %d bytes\n", sizeof t.c, sizeof t.d, sizeof t.h, sizeof t);
      printf("The distance from t.c to t.h is %d bytes", (char *)(&t.h + 1) - (char *)(&t.c));
    }
    sizeof returns type size_t, which is an unsigned integer type. %d is for signed int. If you are using C89, the closest you can get to a correct format specification is to use %lu and cast the argument to unsigned long. In C99, there is the format specification %zu. Subtracting two pointers results in a ptrdiff_t, not a signed int, as your format specification suggests. The correct format specification would be %td. However, the expressions that evaluate to the ptrdiff_t types invoke undefined behavior because 'pointer - pointer' is only defined when both pointers point to the same array object.
    Quote Originally Posted by Elysia View Post
    Ah, but the question is, do I really have to know?
    Assembler is not a very friendly language, after all, and being as much machine-independent as possible makes for very portable code.

    I know people tend to ask stupid questions sometimes. Perhaps they should attend to a hardware engineering course of some sort. But I don't think using assembler before is a good thing.
    Depending on the objective, assembler might make it easier for the person to accomplish his/her task, and therefore be easier.

  2. #17
    Registered User Cooloorful's Avatar
    Join Date
    Feb 2009
    Posts
    59
    Thanks for the information, robwhit. I was not aware of %zu. Nifty. As for the assembler thing, I am merely pointing out that the reasons behind why your compiler aligns things on word, dword, qword (etc) boundaries is entirely transparent outside of assembler. In assembler, when a variable is on some ugly boundary you have to realign registers and go through a painstaking ordeal to use and reuse variables on in efficient boundaries. I am by no means advocating people to learn to write programs using assembler.
    wipe on -
    A slap on the hand is better than a slap on the face. A tragic lesson learned far too late in life.
    - wipe off

  3. #18
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by Cooloorful View Post
    Thanks for the information, robwhit. I was not aware of %zu. Nifty. As for the assembler thing, I am merely pointing out that the reasons behind why your compiler aligns things on word, dword, qword (etc) boundaries is entirely transparent outside of assembler. In assembler, when a variable is on some ugly boundary you have to realign registers and go through a painstaking ordeal to use and reuse variables on in efficient boundaries. I am by no means advocating people to learn to write programs using assembler.
    Well right, you may not want to program in assembly, but it's still useful to understand how it works.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  4. #19
    Registered User Cooloorful's Avatar
    Join Date
    Feb 2009
    Posts
    59
    It only helps your programming abilities to know how it works, as Sebastiani said. Its analogous to knowing how an automatic transmission works. Sure its a highly complicated piece of equipment that is best not tinkered with by the faint of heart, but simply knowing how it mechanically works can be essential in diagnosing a grinding sound whenever you accelerate.
    wipe on -
    A slap on the hand is better than a slap on the face. A tragic lesson learned far too late in life.
    - wipe off

  5. #20
    Registered User
    Join Date
    Jun 2009
    Location
    US of A
    Posts
    305
    That's great. I get more than what i ask for :-). I think i can look up a little bit on the assembler side as well (not the programming of it yeah but definitely a small tutorial .

    Can someone direct me to some good link for the same :-)

  6. #21
    Registered User Cooloorful's Avatar
    Join Date
    Feb 2009
    Posts
    59
    Actually some of the tutorials that come with Masm32 as well as the loads of example programs should suffice. Good tutorials are hard to come by these days.
    wipe on -
    A slap on the hand is better than a slap on the face. A tragic lesson learned far too late in life.
    - wipe off

  7. #22
    Making mistakes
    Join Date
    Dec 2008
    Posts
    476
    S/he probably wants to do some binary in/output. And I say: don't! Don't even fread your shorts. A far better solution would be:

    Code:
    // The file is in little-endian format
    structure.size = fgetc(file) | (fgetc(file) << 8);
    
    // The file is in big-endian format
    structure.size = (fgetc(file) << 8) | fgetc(file);
    
    // No way! Even when structure is #pragma packed. Maybe someone is compiling this under Comeau's?
    fread(&structure, sizeof structure, 1, file);
    And if you don't use pack, things will probably go faster. Many architectures can read word-aligned data _much_ faster. The performance difference between "(int)pointer" and "(int)(pointer + 1)" is really huge.

  8. #23
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    the difference between casting a pointer to an int and casting a pointer to the next element of the array to int is really huge? there is a difference, but I don't think it's relevant to the discussion.

  9. #24
    Making mistakes
    Join Date
    Dec 2008
    Posts
    476
    No. I didn't know 5 people posted while I was writing!

    Anyway, the thing I wanted to show is that reading from an odd memory address (without alignment) is not even half as fast as reading from an even one (on my machine). The best is of course always 4 bytes on a pentium (8 for doubles and long doubles).

    Did I forgot to mention that pointer was a "char *"? And I made other mistakes:

    Code:
    char *pointer = malloc(sizeof(int) + 1);
    *(int *)pointer;          // Fast if the compiler doesn't optimize it away
    *(int *)(pointer + 1;  // Slow!
    It's like when I learned that unbuffered IO was _slow_, too. My program didn't even finish, I just quit it because of boredom.

    If you try to circumvent what the compiler sees the best, you'll get some trouble. Especially if the struct is heavily used.

  10. #25
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    No. I didn't know 5 people posted while I was writing!
    It took you 14.5 hours to write that post?
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  11. #26
    Making mistakes
    Join Date
    Dec 2008
    Posts
    476
    I don't think so. Today is Friday, right?

    Or I didn't see the next page. Sorry for that.

  12. #27
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    Quote Originally Posted by roaan View Post
    So is it something that to maintain proper allotment of memory in multiples of 4, the compiler allocates 4 bytes for char as well.

    But then if i were to have a single char like

    char c;
    and then use sizeof(char) it would return 1.

    Why does it not return 4 even here? Is word alignment not an issue here.
    If I may augment robwhit's explanation...

    Yes, word alignment is an issue here too. Compilers, in the interest of memory access efficiency, align variables to word (whatever the size of the native machine) boundary. The compiler is still obligated to return the correct "size" of the storage type.

    You can test this by declaring several chars - char a, b, c, and then displaying the addresses of each. I bet they will be word-aligned.

    It's a very good point you bring up though. Why would the compiler include padding in its calculations for a structure, but not for simple types.

    Perhaps the idea is that when either is used in an array, the padding in the structure is included to calculate the element size. Thus each element is guaranteed to start on a memory location favorable to 32-bit addressing. Whereas for single bytes (char), there is no friendly-to-memory alignment assumed.
    (sorry, I posted this before I knew there was page 2 of this thread and read robwhit's explanation of possible array use... which I came up with on my own as well)
    Last edited by nonoob; 08-01-2009 at 01:14 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Problem referencing structure elements by pointer
    By trillianjedi in forum C Programming
    Replies: 19
    Last Post: 06-13-2008, 05:46 PM
  2. Replies: 14
    Last Post: 06-28-2006, 01:58 AM
  3. finding size of empty char array
    By darsunt in forum C Programming
    Replies: 12
    Last Post: 05-30-2006, 07:23 PM
  4. Dikumud
    By maxorator in forum C++ Programming
    Replies: 1
    Last Post: 10-01-2005, 06:39 AM
  5. Serial Communications in C
    By ExDigit in forum Windows Programming
    Replies: 7
    Last Post: 01-09-2002, 10:52 AM