Alignment tutorial needed!!

**mynickmynick** · 09-03-2008

I have urgently to understand starting from a zero level all allignment issues/problems. In C or C++. In Linux with pthreads or in Microsoft.
I apologize if I already discussed this topic partially in some threads but, due to my ignorance, I didn't manage to understand the topic so it would be better to "reset and start again".

(1) When/Why is alignment needed? When one should change the compiler/linker defaults?
(2) What about alignment of arrays and structures?
(3) What about the chance that two threads disturb each other if they access two data which are adjacent?

Thank you very much

**matsp** · 09-04-2008

(1) When you know for sure that you know better than the compiler. This is rare. Typically, it is necessary when you have data that is smaller than the hardware required alignment - for example you store data in a char arr[16], but you want to use SSE to access the data [efficiently], so you need an alignment of 16 bytes, whilst the natural alignment of the data type is 1. Same applies to an array double d[2] that you want to use with SSE - it needs to be aligned to 16 bytes, not the 8 that the compiler provides.

(2) Normally, arrays and structs are aligned to the largest alignment of any item in the array/struct. Internally, within the struct, alignments are [normally] made to the natural alignment to the individual elements. Arrays in themselves are aligned to the alignment of the elements - so an int array would be aligned to 4 bytes, whilst a double array would be aligned to 8 bytes.

(3) This does not happen on x86 [at least with respect to correctness of the result - perfomance may suffer if you get this wrong]. Other architectures, you'd have to figure out what the machine is capable of doing with regards to reading/writing partial natural words. It is often best to err on the side of caution here and use for example int that is guaranteed to be easily accessable on any machine [Note that you may want to use a typedef here so that you can easily change the type depending on the architecture].

Note also that a lot of problems with threads can be avoided by storing each threads own data in it's own allocated memory block instead of for example in an array that is shared between threads. The internal alignment and spacing that happens naturally from allocating the memory will "automagically" remove the overlapping cache-lines problems and such.

--
Mats

**mynickmynick** · 09-04-2008

Originally Posted by matsp

(1) SSE

(3) This does not happen on x86 [at least with respect to correctness of the result - perfomance may suffer if you get this wrong]. Mats

Thank you.
(1) What is SSE?
(3) x86 includes Core 2 Duo?

**matsp** · 09-04-2008

SSE = Streaming SIMD Extension, which is "new" instructions on x86 processors, introduced first as SSE1 in Intel's Pentium 3, then with further extensions in Pentium4, Core 2 has SSE3 I think. AMD also uses SSE instructions to various extent depending on model of processor.

Yes, Core2 is an x86 processor (it is also capable of being an x86-64 aka X64 processor).

--
Mats

**mynickmynick** · 09-04-2008

Originally Posted by matsp

SSE = Streaming SIMD Extension, which is "new" instructions on x86 processors, introduced first as SSE1 in Intel's Pentium 3, then with further extensions in Pentium4, Core 2 has SSE3 I think. AMD also uses SSE instructions to various extent depending on model of processor.

Yes, Core2 is an x86 processor (it is also capable of being an x86-64 aka X64 processor).

--
Mats

Very interesting.
Is Core2 Duo a x86 default? When it's used as x64?

**matsp** · 09-04-2008

Originally Posted by mynickmynick

Very interesting.
Is Core2 Duo a x86 default? When it's used as x64?

When "LME" and "LMA" are set in in EFER and bit 53 of the code-segment is 1 - that is, when you use a 64-bit OS.

[1] I'm sure this means little to you - but it is "how you enable 64-bit mode" in a 64-bit capable processor, which all Athlon64 (and Opteron, Turion, Phenom etc), some Penitum 4 and all Core2 have (along with their Intel brothers and sisters).

--
Mats

**mynickmynick** · 09-12-2008

But in the original question (1) it was answered only to the second part (1.2) When one should change the compiler/linker defaults?
So I will make a new thread for
(1.1)
When/Why is alignment needed?

**mynickmynick** · 09-12-2008

How the default alignment provided by a compiler prevents that two threads accessing adjacent data do not conflict (corrupt each other data)??

**mynickmynick** · 09-12-2008

for instance if you have
int a1,a2;
accessed and written by thread1 thread2

thread1
..
a1=2;
..

thread2
a2=3;

thread1 (in pseudoassembly) : get a1
thread1 : mov a1,2
thread2 : get a2 (but gets also bits of a1)
thread2 : mov a2, 3
thread1 : store a1
thread2 : store a2 (but stores also old bits of a1)!!

Does the compiler give sufficient defaults alignment to prevent this??

**matsp** · 09-12-2008

Originally Posted by mynickmynick

How the default alignment provided by a compiler prevents that two threads accessing adjacent data do not conflict (corrupt each other data)??

Well, there is no such guarantee in ALL processors. If we the processor is x86 [and most other types of processors], then it can access all sizes of data [except bitfields] with single operations, so there should be no conflict there.

On a few processors [MIPS comes to mind, and I think Alpha also does this], a 16-bit word modification may for example consist of a sequence like this: "load 32 bits, mask out the relevant 16 bits, modify the value, mask it again, OR it back in and store 32 bits". In this case, the processor will not be able to do that without affecting another thread if the other thread is sharing that data. In this case, multiple threads may not be safe to access a struct or array consisting of 16-bit words. This, however is a pretty rare case. I think I suggested that you avoid 16-bit words in the first place [in my first answer - I said "it is often best to err on the safe side and use int" (added emphasis)].

The time when you should change the alignment would be when you know 100% sure that you know better than the compiler/linker what is required. The typical example of this is kernel code that requires that some data is page-aligned (4KB), or when the kernel allocates memory for a thread control block, that the control block is in it's own cache-line, to avoid other threads sharing the same cache-line.

Finally, it is often best to make sure that you do not share cache-lines between threads - the simple solution is to store all thread-specific data in a block of data, and use malloc() to allocate the block. As long as the block is at least the size of the cache-line, it will more or less automatically avoid being shared across cache-lines. And that solution solves all other competition between threads problems too.

--
Mats

**mynickmynick** · 09-12-2008

thank you
i focuse on Intel (or compatible) processors at the moment
so i guess each one-byte or 2-bytes or 4-bytes or n-bytes access in the C-code is compiled as respectively one-byte or 2-bytes or 4-bytes or n-bytes access in the executable

**matsp** · 09-12-2008

Originally Posted by mynickmynick

thank you
i focuse on Intel (or compatible) processors at the moment
so i guess each one-byte or 2-bytes or 4-bytes or n-bytes access in the C-code is compiled as respectively one-byte or 2-bytes or 4-bytes or n-bytes access in the executable

Yes - x86 can access 8[1], 16[2], 32[4], 64[8], 128[16] and 80[10] bits[bytes] [the 80[10] only as floating point] as a single unit, and write it back again without disrupting nearby elements, and there is no reason why the compiler would do anything else under any circumstances, unless you are really jumping through hoops to make it so.

--
Mats

Thread: Alignment tutorial needed!!

Thread Tools

Search Thread

Display

Alignment tutorial needed!!

Similar Threads

How to fix misaligned assignment statements in the source code?

My new website

Tutorial review

My DirectInput tutorial....

Most needed tutorial