Gents,
I am using atomic instructions on x64 and variables so used must be 16 byte aligned.
I use a number of structures where their members are so operated upon.
The structures accordingly needs must be 16 byte aligned and padded - their internal members must be on 16 byte boundaries and, crucially, there must be tail padding to a 16 byte boundary, so I can allocate arrays of these structures and use pointer math to iterate.
(I am naturally using aligned malloc).
The problem I am finding is that it is not apparent to me how to achieve this end.
Here below we have a test structure (currently I'm working with the latest Amazon Linux GCC, 4.6.3, on x64);
Code:
#define LFDS700_ALIGN_DOUBLE_POINTER 16
#define LFDS700_ALIGN(alignment) __attribute__( (aligned(alignment)) )
LFDS700_ALIGN(LFDS700_ALIGN_DOUBLE_POINTER) struct test_element
{
struct lfds700_freelist_element
fe;
lfds700_atom_t
thread_number;
unsigned int
datum;
};
This in turn contains as you have seen a struct lfds700_freelist_element, thus (PAC_SIZE is 2);
Code:
LFDS700_ALIGN(LFDS700_ALIGN_DOUBLE_POINTER) struct lfds700_freelist_element
{
struct lfds700_freelist_element
*next[PAC_SIZE];
void const
*user_data;
};
I allocate an array of test elements, thus;
Code:
te_array = abstraction_aligned_malloc( sizeof(struct test_element) * 100000, LFDS700_ALIGN_DOUBLE_POINTER );
The problem manifest is that sizeof(struct test_element) is 40 bytes!
So the second element does not begin on a 16 byte boundary and we all fall down.
Printing the addresses of the first element in the test element array, I see the following;
Code:
(gdb) print *ts->te_array
$2 = {fe = {next = {0x7fffec0008d0, 0x2}, user_data = 0x7fffdc0008d0}, thread_number = 3, datum = 0}
(gdb) print sizeof(struct test_element)
$3 = 40
(gdb) print &ts->te_array->fe.next
$4 = (struct lfds700_freelist_element *(*)[2]) 0x7fffdc0008d0 (16 bytes long and aligned on 16 bytes)
(gdb) print &ts->te_array->fe.user_data
$5 = (const void **) 0x7fffdc0008e0 (8 bytes long and aligned on 16 bytes)
(gdb) print &ts->te_array->thread_number
$6 = (lfds700_atom_t *) 0x7fffdc0008e8 (8 bytes long and aligned on 8 bytes)
(gdb) print &ts->te_array->datum
$7 = (unsigned int *) 0x7fffdc0008f0 (8 bytes long and aligned on 16 bytes)
So we see fe->next is the first element and so is correctly aligned curtsey of aligned malloc, where fe->next is 16 bytes, fe->user_data is correctly aligned, but then te->thread_number is misaligned and te->datum is given eight bytes rather than four, leaving us in the end without correct tail padding to a 16 byte boundary.
So, what gives? how *am* I supposed to indicate to the compiler it must pad structures to 16 byte boundaries?