Thread: memory granularity of the processor

  1. #1
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251

    memory granularity of the processor

    With C linux pthreads

    (1)
    How can I enquire which is the memory granularity of the processor ?
    I mean what size has one memory access (write or read)?
    This info I need to avoid multithread problems
    Guess 32bits is the most common?

    (2)
    How can I control padding of variables to 32bit or 64bit? And padding between fields inside a structure or between array elements? (I mean gcc directives)
    Last edited by mynickmynick; 07-14-2008 at 05:44 AM.

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Depends on what you mean by "granularity", and for which processor we are talking.

    An x86 processor accessing a correctly aligned 32-bit value can perform a read or write operation on such a value without splitting it into parts. Likewise, in 64-bit mode (on x86-64 for example), a 64-bit value is atomic with regards to read/write operations. So, as long as no other thread is (potentially) using the same location, then it's safe to operate on 32 (or 64) bit values in one thread, without worrying about what other threads are doing.

    Note however, that a read-modify-write operation where more than one thread may use the same location, would still need some protection to prevent multiple threads accessing a value "halfway" between the read and the write operation, even if the operation itself is one instruction for that particular processor - another cpu(core) doesn't care about operations, as it doesn't actually know what the other processor is doing. To prevent this, you would need (again, on x86) a lock-prefix to ensure that all other potential users back off the bus until the operation is finished - this is, of course bad for performance, as even if the other processors aren't currently using the bus, you'd still have to "ask nicely" and wait for all processors to say "yes, I agree to not use the bus until you're finished". So if this happens often, it will have a negative impact on the performance.

    If you are discussing sharing (or, perhaps more importantly, non-sharing) of items between threads, you would need to take into account the size of a cache-line. This varies between different processors, up to 128 bytes for a single cache-line. You shouldn't share cache-lines [that get written to with any frequency] between processors, even if it is technically safe and yields the correct value, because just like with locking the bus, you have to advertize "I'm going to write to cacheline X, can you all please forget you ever heard of that one". If all processors do this often, then you have very poor performance. Note that sharing data that is very rarely (or only once) written, but read often is perfectly fine.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    For your 2): http://gnu.huihoo.org/gcc/gcc-3.2.3/...ttributes.html #

    [The same applies to other versions, that's just the first link in google that gives the answer].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #4
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251
    we are talking about Pentium and alike
    Might be dual core
    of course if I share variables i lock an appropriate mutex
    Let's neglect for the moment cache line efficiency
    I was focusing on non-shared adjacent variables (and preventing related problems)
    As most data are 4 bytes multiple and not 8 bytes multiple, what gcc directive can ensure I have 8bytes aligning in case my processor has 64 bits granularity?? How can I enquire the granularity of my processor?
    What about elements of a struct or an array?

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    All elements that are 8, 16, 32 or 64 bit can be accessed as single elements [atomically with respect to read and write operations] in x86. So if you have for example
    Code:
    struct X
    {
        int x;
        int y;
    };
    Two different threads can (if we ignore cache efficiency) then access x in one thread and y in the other thread, with no conflicts - it makes no difference if the processor is 32 or 64 bit.

    If you want to know if the processor is CAPABLE of accessing 64-bit values, then you could check
    Code:
    sizeof(int *);
    . If that comes back as 4, then pointers are 32-bit, and thus the processor is in 32-bit mode.

    I forgot to note that floting point, MMX and SSE instructions manage larger "chunks", so the processor could, when using SSE, access 128 bits of data atomically with regards to read and write operations.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> This info I need to avoid multithread problems
    Word tearing previously discussed in this thread: http://cboard.cprogramming.com/showthread.php?t=104627

    >> All elements that are 8, 16, 32 or 64 bit can be accessed as single elements ... atomically ... in x86.
    Even more precisely...
    Quote Originally Posted by
    Intel® 64 and IA-32 Architectures
    Software Developer’s Manual
    Volume 3A:
    System Programming Guide, Part 1

    7.1.1 Guaranteed Atomic Operations

    The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will always be carried out atomically:
    • Reading or writing a byte
    • Reading or writing a word aligned on a 16-bit boundary
    • Reading or writing a doubleword aligned on a 32-bit boundary

    The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically:
    • Reading or writing a quadword aligned on a 64-bit boundary
    • 16-bit accesses to uncached memory locations that fit within a 32-bit data bus

    The P6 family processors (and newer processors since) guarantee that the following additional memory operation will always be carried out atomically:
    • Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line

    Accesses to cacheable memory that are split across bus widths, cache lines, and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided.
    So the part in red is where you can get into trouble. Under normal circumstances, 32-bit variables are usually aligned to 4-byte boundaries. It's the 64-bit variables that can get you into trouble - since malloc() isn't guaranteed to provide memory of the correct alignment etc...

    So after studying all this, it seems to me that the only way to have any kind of word tearing on x86 is via totally un-aligned access - which makes things easier in preventing any possible word tearing on x86. But just because the code is running on x86 hardware doesn't mean we can't make the code more robust for other platforms.

    Using the "struct X" example on a 64-bit CPU, we would need to align and pad both x and y to a 64-bit boundary. This would prevent word tearing on any given 64-bit CPU. However, on x86 we end up wasting half of the memory of a single "struct X". One way I can think of to solve this is to use two compile time constants for alignment: ALIGN32 and ALIGN64 - then:

    * For x86, 64-bit, And all 32-bit CPU's
    ALIGN32 = 4
    ALIGN64 = 8

    * For non-x86, 64-bit CPU's
    ALIGN32 = 8
    ALIGN64 = 8

    Having said all that, I'd like to reiterate again that it's best if the design excludes the possibility of word tearing to begin with.

    gg

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Whilst the C standard definition of malloc may not guarantee appropriate alignment for anything, I think you will find that in practice, malloc from glibc and MS visual studio C library will guarantee at least 8 byte alignment, if not 16 byte alignment.

    If you want to ensure alignment, then you should write a "aligned malloc" - it's pretty simple (although you have to keep track of a second pointer!):
    Code:
    void *mallocAligned(size_t size, size_t align, void **freePtr)
    {
        *freePtr = malloc(size + align);
        void *p = ((char *)*freePtr) + (int) (*freePtr) + (align-1) & ~(align-1);
        return p;
    }
    
    int main()
    {
        void *freePtr; 
        char * myptr = mallocAligned(400, 64, &freePtr);
        ... 
        free(freePtr);
        return 0;
    }
    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    Alessio Stella
    Join Date
    May 2008
    Location
    Italy, Bologna
    Posts
    251
    i am sorry but i do not understand what you mean by "word tearing"

    it's pretty tough to understand that function, but I mainly use static data so I might ignore malloc problems
    Last edited by mynickmynick; 07-28-2008 at 09:57 AM.

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Word tearing is when you access data that is shared within a word.

    However, it shouldn't be a problem as long as:
    1. You don't mess with the compiler settings for alignment.
    2. The processor is an x86 of some sort (AMD, Intel).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Question regarding Memory Leak
    By clegs in forum C++ Programming
    Replies: 29
    Last Post: 12-07-2007, 01:57 AM
  2. Memory problem with Borland C 3.1
    By AZ1699 in forum C Programming
    Replies: 16
    Last Post: 11-16-2007, 11:22 AM
  3. Suggestions on this C style code
    By Joelito in forum C Programming
    Replies: 11
    Last Post: 06-07-2007, 03:22 AM
  4. Relate memory allocation in struct->variable
    By Niara in forum C Programming
    Replies: 4
    Last Post: 03-23-2007, 03:06 PM
  5. Shared Memory - shmget questions
    By hendler in forum C Programming
    Replies: 1
    Last Post: 11-29-2005, 02:15 AM