If you are serious about performance optimisation you should start timing code.
I've got evidence that you can speed alu_bit_set_bit by at least 4x::
Code:
alu_bit_set_bit() took 4.203500 nanoseconds per iteration
alu_bit_set_bit_b() took 0.964200 nanoseconds per iteration
Test passed
Code:
struct alu_bit alu_bit_inc( struct alu_bit pos )
{
uintptr_t ptr = (uintptr_t)(pos.ptr);
pos.bit++;
pos.mask <<= 1;
pos.pos = SET2IF( pos.mask, pos.pos + 1, 0 );
pos.seg = SET2IF( pos.mask, pos.seg, pos.seg + 1 );
pos.ptr = (size_t*)SET2IF( pos.mask, ptr, ptr + sizeof(size_t) );
pos.mask = SET2IF( pos.mask, pos.mask, SIZE_T_C(1) );
return pos;
}
void alu_bit_inc_b( struct alu_bit *pos )
{
if(pos == NULL)
return;
pos->bit++;
pos->mask <<= 1;
if(pos->mask != 0)
return;
pos->mask = 1;
pos->pos = 0;
pos->seg++;
pos->ptr++;
}
PS: I can no longer get the project to build without hacking it a little, just to headers.