memory granularity of the processor

**mynickmynick** · 07-14-2008

With C linux pthreads

(1)
How can I enquire which is the memory granularity of the processor ?
I mean what size has one memory access (write or read)?
This info I need to avoid multithread problems
Guess 32bits is the most common?

(2)
How can I control padding of variables to 32bit or 64bit? And padding between fields inside a structure or between array elements? (I mean gcc directives)

**matsp** · 07-14-2008

Depends on what you mean by "granularity", and for which processor we are talking.

An x86 processor accessing a correctly aligned 32-bit value can perform a read or write operation on such a value without splitting it into parts. Likewise, in 64-bit mode (on x86-64 for example), a 64-bit value is atomic with regards to read/write operations. So, as long as no other thread is (potentially) using the same location, then it's safe to operate on 32 (or 64) bit values in one thread, without worrying about what other threads are doing.

Note however, that a read-modify-write operation where more than one thread may use the same location, would still need some protection to prevent multiple threads accessing a value "halfway" between the read and the write operation, even if the operation itself is one instruction for that particular processor - another cpu(core) doesn't care about operations, as it doesn't actually know what the other processor is doing. To prevent this, you would need (again, on x86) a lock-prefix to ensure that all other potential users back off the bus until the operation is finished - this is, of course bad for performance, as even if the other processors aren't currently using the bus, you'd still have to "ask nicely" and wait for all processors to say "yes, I agree to not use the bus until you're finished". So if this happens often, it will have a negative impact on the performance.

If you are discussing sharing (or, perhaps more importantly, non-sharing) of items between threads, you would need to take into account the size of a cache-line. This varies between different processors, up to 128 bytes for a single cache-line. You shouldn't share cache-lines [that get written to with any frequency] between processors, even if it is technically safe and yields the correct value, because just like with locking the bus, you have to advertize "I'm going to write to cacheline X, can you all please forget you ever heard of that one". If all processors do this often, then you have very poor performance. Note that sharing data that is very rarely (or only once) written, but read often is perfectly fine.

--
Mats

**matsp** · 07-14-2008

For your 2): http://gnu.huihoo.org/gcc/gcc-3.2.3/...ttributes.html #

[The same applies to other versions, that's just the first link in google that gives the answer].

--
Mats

**mynickmynick** · 07-14-2008

we are talking about Pentium and alike
Might be dual core
of course if I share variables i lock an appropriate mutex
Let's neglect for the moment cache line efficiency
I was focusing on non-shared adjacent variables (and preventing related problems)
As most data are 4 bytes multiple and not 8 bytes multiple, what gcc directive can ensure I have 8bytes aligning in case my processor has 64 bits granularity?? How can I enquire the granularity of my processor?
What about elements of a struct or an array?

**matsp** · 07-14-2008

All elements that are 8, 16, 32 or 64 bit can be accessed as single elements [atomically with respect to read and write operations] in x86. So if you have for example

Code:

struct X
{
    int x;
    int y;
};

Two different threads can (if we ignore cache efficiency) then access x in one thread and y in the other thread, with no conflicts - it makes no difference if the processor is 32 or 64 bit.

If you want to know if the processor is CAPABLE of accessing 64-bit values, then you could check

Code:

sizeof(int *);

. If that comes back as 4, then pointers are 32-bit, and thus the processor is in 32-bit mode.

I forgot to note that floting point, MMX and SSE instructions manage larger "chunks", so the processor could, when using SSE, access 128 bits of data atomically with regards to read and write operations.

--
Mats

**Codeplug** · 07-14-2008

>> This info I need to avoid multithread problems
Word tearing previously discussed in this thread: http://cboard.cprogramming.com/showthread.php?t=104627

>> All elements that are 8, 16, 32 or 64 bit can be accessed as single elements ... atomically ... in x86.
Even more precisely...

Originally Posted by
Intel® 64 and IA-32 Architectures
Software Developer’s Manual
Volume 3A:
System Programming Guide, Part 1

7.1.1 Guaranteed Atomic Operations

The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will always be carried out atomically:

Reading or writing a byte
Reading or writing a word aligned on a 16-bit boundary
Reading or writing a doubleword aligned on a 32-bit boundary

The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically:

Reading or writing a quadword aligned on a 64-bit boundary
16-bit accesses to uncached memory locations that fit within a 32-bit data bus

The P6 family processors (and newer processors since) guarantee that the following additional memory operation will always be carried out atomically:

Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line

Accesses to cacheable memory that are split across bus widths, cache lines, and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided.

So the part in red is where you can get into trouble. Under normal circumstances, 32-bit variables are usually aligned to 4-byte boundaries. It's the 64-bit variables that can get you into trouble - since malloc() isn't guaranteed to provide memory of the correct alignment etc...

So after studying all this, it seems to me that the only way to have any kind of word tearing on x86 is via totally un-aligned access - which makes things easier in preventing any possible word tearing on x86. But just because the code is running on x86 hardware doesn't mean we can't make the code more robust for other platforms.

Using the "struct X" example on a 64-bit CPU, we would need to align and pad both x and y to a 64-bit boundary. This would prevent word tearing on any given 64-bit CPU. However, on x86 we end up wasting half of the memory of a single "struct X". One way I can think of to solve this is to use two compile time constants for alignment: ALIGN32 and ALIGN64 - then:

* For x86, 64-bit, And all 32-bit CPU's
ALIGN32 = 4
ALIGN64 = 8

* For non-x86, 64-bit CPU's
ALIGN32 = 8
ALIGN64 = 8

Having said all that, I'd like to reiterate again that it's best if the design excludes the possibility of word tearing to begin with.

gg

**matsp** · 07-15-2008

Whilst the C standard definition of malloc may not guarantee appropriate alignment for anything, I think you will find that in practice, malloc from glibc and MS visual studio C library will guarantee at least 8 byte alignment, if not 16 byte alignment.

If you want to ensure alignment, then you should write a "aligned malloc" - it's pretty simple (although you have to keep track of a second pointer!):

Code:

void *mallocAligned(size_t size, size_t align, void **freePtr)
{
    *freePtr = malloc(size + align);
    void *p = ((char *)*freePtr) + (int) (*freePtr) + (align-1) & ~(align-1);
    return p;
}

int main()
{
    void *freePtr; 
    char * myptr = mallocAligned(400, 64, &freePtr);
    ... 
    free(freePtr);
    return 0;
}

--
Mats

**mynickmynick** · 07-28-2008

i am sorry but i do not understand what you mean by "word tearing"

it's pretty tough to understand that function, but I mainly use static data so I might ignore malloc problems

**matsp** · 07-29-2008

Word tearing is when you access data that is shared within a word.

However, it shouldn't be a problem as long as:
1. You don't mess with the compiler settings for alignment.
2. The processor is an x86 of some sort (AMD, Intel).

--
Mats

Thread: memory granularity of the processor

Thread Tools

Search Thread

Display

memory granularity of the processor

Similar Threads

Question regarding Memory Leak

Memory problem with Borland C 3.1

Suggestions on this C style code

Relate memory allocation in struct->variable

Shared Memory - shmget questions