Shared-memory synchronization issues

**hpc_rocks** · 01-28-2013

What is the best way to ensure synchronization and memory consistency in the following scenario:
- Two processes are running on a multi-core machine
- The machine also has an RDMA-capable network (InifniBand)
- The two processes have read/write access to a window of memory in shared-memory. This window is also registered with the network interface. So, the data in the shared-memory region can be changed by either of the processes on the node and by the network.

Suppose, P1 is running on core0. P2 is running on core1.

P2 initializes its shared-memory window by doing a
memset(.., 1, size); P1 writes 0's into this shared-memory window through the network interface (RDMA, loop-back). I get the required network completion-event and as far as the network is concerned, the 0's have been written into the shared-memory. When P2 tries to read this memory region, it should now read "0", instead of "1". But, I am seeing a race condition here and this is not always the case.

I have tried the following cases:
- pthread-mutex:
1. I have a pthread_mutex in this shared-memory window, initialized with the attribute : PTHREAD_PROCESS_SHARED.
2. P1 locks the mutex and does the network operation. When process P0 gets the network completion event, it releases the lock.
3. P2 then locks the mutex and reads the data from the shared-memory

- memory barriers:
I have tried inserting the following memory barriers, just after P2 acquires the lock and just before it reads the data. It still doesnt seem to help.

__asm__ __volatile__ ( "lfence" ::: "memory" );
__asm__ __volatile__ ( "sfence" ::: "memory" );
__asm__ __volatile__ ( "mfence" ::: "memory" );
__asm__ __volatile__ ( "sfence" ::: "memory" );
__sync_synchronize();

Any suggestions as to how such a situation can be handled to ensure correctness? I am using GNU compilers, and the compute node is an Intel Westmere machine.

Thanks,
--K

**ledow** · 01-28-2013

Not to be too dumb here, but are you sure the write is actually taking place as you think it is?

For example, say you treat the first byte of this shared memory as a flag. If you start with it at zero and P1, when it's received confirmation that the rest of its write was successful, writes an FF byte to it - for example - then you could have P2 loop on a memory read after its spun on a lock. What sort of timing difference are you seeing between P2 acquiring the lock and actually seeing the first byte of memory as an FF byte? Is it even happening at all? If you do the same with only local memory, does your logic work and it's only when you use RDMA that you have a problem?

**hpc_rocks** · 01-28-2013

Hi Ledow,
The write is happening. In some cases, about 100 iterations complete correctly, with P2 seeing the right data in the shared-memory region. Also, if I force P2 to sleep when it detects the error, and if I attach to P2 through gdb to check its data, it is correct. But, the data was not consistent during the execution, when P2 was checking its buffer. I am yet to try a simple memcopy(). I can do that and post an update.
I missed one point earlier, the shared-memory window is basically a file that I have opened in /dev/shm:

fd = shm_open(shm_file, O_CREAT | O_RDWR | O_EXCL, S_IRWXU);

The two processes mmap this file in the following manner;

mem_ptr = mmap(0, shm_file_size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE | MAP_LOCKED,
fd, 0);

Thanks for your help,
--K

**Codeplug** · 01-28-2013

>> This window is also registered with the network interface. So, the data in the shared-memory region can be changed by either of the processes on the node and by the network.
A shared Posix mutex or semaphore is fine for synchronizing processes, but how are you synchronizing with "the network"? What API's are used to register the window with the interface, and read/write data to the window through the interface?

gg

**hpc_rocks** · 01-29-2013

Hi Codeplug, I am using InfiniBand verbs API. So, the code involves calls like ibv_reg_mr() for registering/pinning the memory. ibv_post_send() to do the RDMA transfers. With RDMA, the accepted way for a process to check if it has received the data on a pending receive operation is to poll the memory. However, the twist in my case is that the memory is actually an mmaped() file in /dev/shm. I even tried doing a msync() to force the data to be flushed back to the file in /dev/shm, so that P2 can read the right data. It doesnt seem to help much.

Ledow: I am yet to try just doing a simple localcopy between the memory regions. Thats on my todo list. It may help to isolate the cause of this issue.

Thread: Shared-memory synchronization issues

Thread Tools

Search Thread

Display

Shared-memory synchronization issues

Similar Threads

Shared Memory

Shared memory IPC Help!!!

STL in Shared Memory

Shared Memory...

Shared Memory