sizeof applied to structure...

**robwhit** · 07-29-2009

Originally Posted by roaan

Just a question when i have a structure declaration like

[insert]

Code:

struct node
{
	int data;
	//struct node *link;
	char ch;
};

and i use sizeof(struct node) the sizeof returns 8 though it should have been 5. Though the reason mentioned is

"The reason for this is that most compilers, by default, align complex data-structures to a word alignment boundary" i am unable to get what is meant by word alignment boundary.

Originally Posted by N1256 3.2#1

alignment
requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address

Different data types have different alignment requirements. Most of the time, these alignment requirements are dictated by the architecture the program is run on, however, the compiler could conceivably have some extra functionality that enabled the effect of unaligned accesses, by, for instance, making multiple reads/writes and assembling the parts itself. On some platforms, like x86, unaligned accesses are slower than aligned accesses, but they work. On other systems, it makes the program crash. So, naturally, the compiler would want to lay out the members in a struct so that when they were accessed, the accesses would be aligned.

In the case of your struct, it's likely that an int is 4 bytes and is aligned to a 4-byte boundary. The char would be 1 byte and aligned to a 1-byte boundary. Thus, the char can be located at any byte address, but aligned ints can only begin at every 4 byte addresses.

The compiler has to account for the possibility of the struct being an element of an array. Since arrays have to be contiguous, the compiler has to add padding somewhere in the struct in order for array[1] and later elements' int member to be properly aligned. This means an extra 3 bytes of padding. This could be inserted either betwen the int and char members, or the char member and the end, but not before the first member. This explains why the struct has size 8.

Originally Posted by Cooloorful

Unions are also capable (and a more portable solution) of changing the alignment of a structure.

Code:

union node
{
  struct
  {
    int data;
    char c;
  };

  char raw[5];
};

Putting a struct in a union with a char array would typically do nothing to affect alignment.

Originally Posted by roaan

So is it something that to maintain proper allotment of memory in multiples of 4, the compiler allocates 4 bytes for char as well.

But then if i were to have a single char like

char c;
and then use sizeof(char) it would return 1.

Why does it not return 4 even here? Is word alignment not an issue here.

The extra 3 bytes is not assigned to the char, but to the struct in the form of padding.

Originally Posted by Cooloorful

Code:

void function1(void)
{
  int c;
  char d;
  float h;

  printf("c is %d bytes\nb is %d bytes\nh is %d bytes\n", sizeof c, sizeof d, sizeof h);
  printf("The distance from c to h is %d bytes", (char *)(&h + 1) - (char *)(&c));
}

The problem with data being aligned on weird boundaries is that it makes the stack run less efficiently since it keeps needing to realign the stack to different boundaries.

Code:

void function2(void)
{
  struct {
    int c;
    char d;
    float h;
  } t;

  printf("t.c is %d bytes\nt.b is %d bytes\nt.h is %d bytes\nt is %d bytes\n", sizeof t.c, sizeof t.d, sizeof t.h, sizeof t);
  printf("The distance from t.c to t.h is %d bytes", (char *)(&t.h + 1) - (char *)(&t.c));
}

sizeof returns type size_t, which is an unsigned integer type. %d is for signed int. If you are using C89, the closest you can get to a correct format specification is to use %lu and cast the argument to unsigned long. In C99, there is the format specification %zu. Subtracting two pointers results in a ptrdiff_t, not a signed int, as your format specification suggests. The correct format specification would be %td. However, the expressions that evaluate to the ptrdiff_t types invoke undefined behavior because 'pointer - pointer' is only defined when both pointers point to the same array object.

Originally Posted by Elysia

Ah, but the question is, do I really have to know?
Assembler is not a very friendly language, after all, and being as much machine-independent as possible makes for very portable code.

I know people tend to ask stupid questions sometimes. Perhaps they should attend to a hardware engineering course of some sort. But I don't think using assembler before is a good thing.

Depending on the objective, assembler might make it easier for the person to accomplish his/her task, and therefore be easier.

**Cooloorful** · 07-29-2009

Thanks for the information, robwhit. I was not aware of %zu. Nifty. As for the assembler thing, I am merely pointing out that the reasons behind why your compiler aligns things on word, dword, qword (etc) boundaries is entirely transparent outside of assembler. In assembler, when a variable is on some ugly boundary you have to realign registers and go through a painstaking ordeal to use and reuse variables on in efficient boundaries. I am by no means advocating people to learn to write programs using assembler.

**Sebastiani** · 07-29-2009

Originally Posted by Cooloorful

Thanks for the information, robwhit. I was not aware of %zu. Nifty. As for the assembler thing, I am merely pointing out that the reasons behind why your compiler aligns things on word, dword, qword (etc) boundaries is entirely transparent outside of assembler. In assembler, when a variable is on some ugly boundary you have to realign registers and go through a painstaking ordeal to use and reuse variables on in efficient boundaries. I am by no means advocating people to learn to write programs using assembler.

Well right, you may not want to program in assembly, but it's still useful to understand how it works.

**Cooloorful** · 07-29-2009

It only helps your programming abilities to know how it works, as Sebastiani said. Its analogous to knowing how an automatic transmission works. Sure its a highly complicated piece of equipment that is best not tinkered with by the faint of heart, but simply knowing how it mechanically works can be essential in diagnosing a grinding sound whenever you accelerate.

**roaan** · 07-29-2009

That's great. I get more than what i ask for :-). I think i can look up a little bit on the assembler side as well (not the programming of it yeah but definitely a small tutorial .

Can someone direct me to some good link for the same :-)

**Cooloorful** · 07-29-2009

Actually some of the tutorials that come with Masm32 as well as the loads of example programs should suffice. Good tutorials are hard to come by these days.

**Brafil** · 07-30-2009

S/he probably wants to do some binary in/output. And I say: don't! Don't even fread your shorts. A far better solution would be:

Code:

// The file is in little-endian format
structure.size = fgetc(file) | (fgetc(file) << 8);

// The file is in big-endian format
structure.size = (fgetc(file) << 8) | fgetc(file);

// No way! Even when structure is #pragma packed. Maybe someone is compiling this under Comeau's?
fread(&structure, sizeof structure, 1, file);

And if you don't use pack, things will probably go faster. Many architectures can read word-aligned data _much_ faster. The performance difference between "(int)pointer" and "(int)(pointer + 1)" is really huge.

**robwhit** · 07-30-2009

the difference between casting a pointer to an int and casting a pointer to the next element of the array to int is really huge? there is a difference, but I don't think it's relevant to the discussion.

**Brafil** · 07-30-2009

No. I didn't know 5 people posted while I was writing!

Anyway, the thing I wanted to show is that reading from an odd memory address (without alignment) is not even half as fast as reading from an even one (on my machine). The best is of course always 4 bytes on a pentium (8 for doubles and long doubles).

Did I forgot to mention that pointer was a "char *"? And I made other mistakes:

Code:

char *pointer = malloc(sizeof(int) + 1);
*(int *)pointer;          // Fast if the compiler doesn't optimize it away
*(int *)(pointer + 1;  // Slow!

It's like when I learned that unbuffered IO was _slow_, too. My program didn't even finish, I just quit it because of boredom.

If you try to circumvent what the compiler sees the best, you'll get some trouble. Especially if the struct is heavily used.

**Cactus_Hugger** · 07-30-2009

No. I didn't know 5 people posted while I was writing!

It took you 14.5 hours to write that post?

**Brafil** · 07-31-2009

I don't think so. Today is Friday, right?

Or I didn't see the next page. Sorry for that.

**nonoob** · 08-01-2009

Originally Posted by roaan

So is it something that to maintain proper allotment of memory in multiples of 4, the compiler allocates 4 bytes for char as well.

But then if i were to have a single char like

char c;
and then use sizeof(char) it would return 1.

Why does it not return 4 even here? Is word alignment not an issue here.

If I may augment robwhit's explanation...

Yes, word alignment is an issue here too. Compilers, in the interest of memory access efficiency, align variables to word (whatever the size of the native machine) boundary. The compiler is still obligated to return the correct "size" of the storage type.

You can test this by declaring several chars - char a, b, c, and then displaying the addresses of each. I bet they will be word-aligned.

It's a very good point you bring up though. Why would the compiler include padding in its calculations for a structure, but not for simple types.

Perhaps the idea is that when either is used in an array, the padding in the structure is included to calculate the element size. Thus each element is guaranteed to start on a memory location favorable to 32-bit addressing. Whereas for single bytes (char), there is no friendly-to-memory alignment assumed.
(sorry, I posted this before I knew there was page 2 of this thread and read robwhit's explanation of possible array use... which I came up with on my own as well)

Thread: sizeof applied to structure...

Thread Tools

Search Thread

Display

Similar Threads

Problem referencing structure elements by pointer

Okay, giant issue, I think I'm treading into territory outside of cboards maybe...

finding size of empty char array

Dikumud

Serial Communications in C