Thread: Alignment tutorial needed!!

  1. #1
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251

    Alignment tutorial needed!!

    I have urgently to understand starting from a zero level all allignment issues/problems. In C or C++. In Linux with pthreads or in Microsoft.
    I apologize if I already discussed this topic partially in some threads but, due to my ignorance, I didn't manage to understand the topic so it would be better to "reset and start again".

    (1) When/Why is alignment needed? When one should change the compiler/linker defaults?
    (2) What about alignment of arrays and structures?
    (3) What about the chance that two threads disturb each other if they access two data which are adjacent?

    Thank you very much

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    (1) When you know for sure that you know better than the compiler. This is rare. Typically, it is necessary when you have data that is smaller than the hardware required alignment - for example you store data in a char arr[16], but you want to use SSE to access the data [efficiently], so you need an alignment of 16 bytes, whilst the natural alignment of the data type is 1. Same applies to an array double d[2] that you want to use with SSE - it needs to be aligned to 16 bytes, not the 8 that the compiler provides.

    (2) Normally, arrays and structs are aligned to the largest alignment of any item in the array/struct. Internally, within the struct, alignments are [normally] made to the natural alignment to the individual elements. Arrays in themselves are aligned to the alignment of the elements - so an int array would be aligned to 4 bytes, whilst a double array would be aligned to 8 bytes.

    (3) This does not happen on x86 [at least with respect to correctness of the result - perfomance may suffer if you get this wrong]. Other architectures, you'd have to figure out what the machine is capable of doing with regards to reading/writing partial natural words. It is often best to err on the side of caution here and use for example int that is guaranteed to be easily accessable on any machine [Note that you may want to use a typedef here so that you can easily change the type depending on the architecture].

    Note also that a lot of problems with threads can be avoided by storing each threads own data in it's own allocated memory block instead of for example in an array that is shared between threads. The internal alignment and spacing that happens naturally from allocating the memory will "automagically" remove the overlapping cache-lines problems and such.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251
    Quote Originally Posted by matsp View Post
    (1) SSE

    (3) This does not happen on x86 [at least with respect to correctness of the result - perfomance may suffer if you get this wrong]. Mats
    Thank you.
    (1) What is SSE?
    (3) x86 includes Core 2 Duo?

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    SSE = Streaming SIMD Extension, which is "new" instructions on x86 processors, introduced first as SSE1 in Intel's Pentium 3, then with further extensions in Pentium4, Core 2 has SSE3 I think. AMD also uses SSE instructions to various extent depending on model of processor.

    Yes, Core2 is an x86 processor (it is also capable of being an x86-64 aka X64 processor).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251
    Quote Originally Posted by matsp View Post
    SSE = Streaming SIMD Extension, which is "new" instructions on x86 processors, introduced first as SSE1 in Intel's Pentium 3, then with further extensions in Pentium4, Core 2 has SSE3 I think. AMD also uses SSE instructions to various extent depending on model of processor.

    Yes, Core2 is an x86 processor (it is also capable of being an x86-64 aka X64 processor).

    --
    Mats

    Very interesting.
    Is Core2 Duo a x86 default? When it's used as x64?

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by mynickmynick View Post
    Very interesting.
    Is Core2 Duo a x86 default? When it's used as x64?
    When "LME" and "LMA" are set in in EFER and bit 53 of the code-segment is 1 - that is, when you use a 64-bit OS.

    [1] I'm sure this means little to you - but it is "how you enable 64-bit mode" in a 64-bit capable processor, which all Athlon64 (and Opteron, Turion, Phenom etc), some Penitum 4 and all Core2 have (along with their Intel brothers and sisters).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251
    But in the original question (1) it was answered only to the second part (1.2) When one should change the compiler/linker defaults?
    So I will make a new thread for
    (1.1)
    When/Why is alignment needed?

  8. #8
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251
    How the default alignment provided by a compiler prevents that two threads accessing adjacent data do not conflict (corrupt each other data)??

  9. #9
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251
    for instance if you have
    int a1,a2;
    accessed and written by thread1 thread2

    thread1
    ..
    a1=2;
    ..

    thread2
    a2=3;


    thread1 (in pseudoassembly) : get a1
    thread1 : mov a1,2
    thread2 : get a2 (but gets also bits of a1)
    thread2 : mov a2, 3
    thread1 : store a1
    thread2 : store a2 (but stores also old bits of a1)!!

    Does the compiler give sufficient defaults alignment to prevent this??

  10. #10
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by mynickmynick View Post
    How the default alignment provided by a compiler prevents that two threads accessing adjacent data do not conflict (corrupt each other data)??
    Well, there is no such guarantee in ALL processors. If we the processor is x86 [and most other types of processors], then it can access all sizes of data [except bitfields] with single operations, so there should be no conflict there.

    On a few processors [MIPS comes to mind, and I think Alpha also does this], a 16-bit word modification may for example consist of a sequence like this: "load 32 bits, mask out the relevant 16 bits, modify the value, mask it again, OR it back in and store 32 bits". In this case, the processor will not be able to do that without affecting another thread if the other thread is sharing that data. In this case, multiple threads may not be safe to access a struct or array consisting of 16-bit words. This, however is a pretty rare case. I think I suggested that you avoid 16-bit words in the first place [in my first answer - I said "it is often best to err on the safe side and use int" (added emphasis)].

    The time when you should change the alignment would be when you know 100% sure that you know better than the compiler/linker what is required. The typical example of this is kernel code that requires that some data is page-aligned (4KB), or when the kernel allocates memory for a thread control block, that the control block is in it's own cache-line, to avoid other threads sharing the same cache-line.

    Finally, it is often best to make sure that you do not share cache-lines between threads - the simple solution is to store all thread-specific data in a block of data, and use malloc() to allocate the block. As long as the block is at least the size of the cache-line, it will more or less automatically avoid being shared across cache-lines. And that solution solves all other competition between threads problems too.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  11. #11
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251
    thank you
    i focuse on Intel (or compatible) processors at the moment
    so i guess each one-byte or 2-bytes or 4-bytes or n-bytes access in the C-code is compiled as respectively one-byte or 2-bytes or 4-bytes or n-bytes access in the executable
    Last edited by mynickmynick; 09-12-2008 at 04:32 AM.

  12. #12
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by mynickmynick View Post
    thank you
    i focuse on Intel (or compatible) processors at the moment
    so i guess each one-byte or 2-bytes or 4-bytes or n-bytes access in the C-code is compiled as respectively one-byte or 2-bytes or 4-bytes or n-bytes access in the executable
    Yes - x86 can access 8[1], 16[2], 32[4], 64[8], 128[16] and 80[10] bits[bytes] [the 80[10] only as floating point] as a single unit, and write it back again without disrupting nearby elements, and there is no reason why the compiler would do anything else under any circumstances, unless you are really jumping through hoops to make it so.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 28
    Last Post: 07-16-2006, 11:35 PM
  2. My new website
    By joeprogrammer in forum A Brief History of Cprogramming.com
    Replies: 19
    Last Post: 03-17-2006, 07:38 PM
  3. Tutorial review
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 11
    Last Post: 03-22-2004, 09:40 PM
  4. My DirectInput tutorial....
    By jdinger in forum A Brief History of Cprogramming.com
    Replies: 1
    Last Post: 06-18-2002, 11:32 PM
  5. Most needed tutorial
    By Deckard in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 01-19-2002, 08:52 AM