Nobody can review a codebase in 10 minutes. This took much longer.
I ran a profile over the latest build:
Code:
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 86753 0.00 0.00 alu_bit_dec
0.00 0.00 0.00 39747 0.00 0.00 alu_bit_inc
0.00 0.00 0.00 3159 0.00 0.00 alu_bit_set_bit
0.00 0.00 0.00 1653 0.00 0.00 alu_reg_end_bit
0.00 0.00 0.00 816 0.00 0.00 alu_reg_cmp
0.00 0.00 0.00 191 0.00 0.00 alu_reg__shl
0.00 0.00 0.00 186 0.00 0.00 alu_lowest_upto
0.00 0.00 0.00 175 0.00 0.00 alu_reg_sub
0.00 0.00 0.00 165 0.00 0.00 alu_get_reg_n.....
As it stands, alu_bit_dec and alu_bit_inc are the most used functions, so an improvement in those will pay off many times over, compared with improving alu_reg_cmp().
So let's see the code involved:
From alu_bit.h, where the structure is defined:
Code:
typedef struct alu_bit {
size_t s, b, p, *S, B;
} alu_bit_t;
For somebody who doesn't know the codebase, the members have zero context and meaning. That goes doubly so for using upper and lowecase member names (e.g. "s" and "S").
From alu_bit.c:
Code:
struct alu_bit alu_bit_inc( struct alu_bit pos )
{
pos.b++;
pos.p++;
pos.B <<= 1;
if ( pos.B )
return pos;
pos.p = 0;
pos.B = SIZE_T_C(1);
pos.s++;
pos.S++;
return pos;
}
And here is an example of using it:
Code:
int_t alu_reg_not( alu_t *alu, alu_reg_t num )
{
alu_bit_t n = {0};
void *part;
if ( alu )
{
num.node %= alu_used( alu );
part = alu_reg_data( alu, num );
n = alu_bit_set_bit( part, num.from );
for ( ; n.b < num.upto; n = alu_bit_inc( n ) )
{
*(n.S) ^= n.B;
}
return 0;
}
return alu_err_null_ptr( "alu" );
}
I still really don't like the way you are copying a structure with lots of memebers on the stack, manipulate it inside the function then copying it back off the stack into the original structure. It
will be slow.
However, lets improve the code a bit....
So let's work out what is going on:
Code:
struct alu_bit alu_bit_inc( struct alu_bit pos )
{
pos.b++; // Increment the bit number
pos.p++; // Increment the position number, in this word
pos.B <<= 1; // Shift the mask over one bit
if ( pos.B ) // Have we not used all the bits in this word?
return pos; // Return
pos.p = 0; // We are on bit zero of the word
pos.B = 1; // We are using the rightmost bit of the word
pos.s++; // Increment the index of the word being checked
pos.S++; // Increment the pointer to the word being checked
return pos;
}
So now rename the items in the structure to give meaning:
Code:
typedef struct alu_bit {
size_t bit_mask_in_word; // formally B
size_t bit_number_in_reg; // formally b
size_t position_in_word; // formally p
size_t *current_word, // formally S
size_t current_word_index; // formally s
} alu_bit_t;
An update the code that uses them:
Code:
struct alu_bit alu_bit_inc( struct alu_bit pos )
{
pos.bit_number_in_reg++;
pos.position_in_word++;
pos.bit_mask_in_word <<= 1;
if ( pos.bit_mask_in_word == 0 )
{
// Move onto the next word
pos.position_in_word = 0;
pos.bit_mask_in_word = 1;
pos.current_word++;
pos.current_word_index++;
}
return pos;
}
Oh, so now it all becomes clear. At this point I see what is going on have another look at alu_reg_not() andunderstand how it is working through and flipping bits one by one in a block of data.
If I was re-writing everything from scratch, is what it would look like:
Code:
....
#define REG_DATA_BITS 32
typdef uint32_t reg_data_t
struct alu {
int registers_used; // Number of elements allocated
uint32_t **reg_data; // Pointer to an array of pointers that hold the data, dynamically allocated
uint32_t *reg_size; // Pointer to an array of sizes, dynamically allocated
};
typedef struct alu alu_t;
....
int_t alu_reg_not( alu_t *alu, uint32_t num )
{
reg_data_t *data;
uint32_t to_flip;
if(!alu) {
alu_err_null_ptr( "alu" );
}
if(num >= alu->registers_used) {
alu_err_reg_num( "alu" );
}
data = alu->reg_data[num]; // get the pointer to data for this register 'num'
while(to_flip = alu->reg_size[num]; to_flip >= REG_DATA_BITS; to_flip -= REG_DATA_BITS) {
*data = ~(*data); // Flip all the bits
data++; // move onto the next word
}
// Now flip any remaining bits
if(to_flip > 0) {
reg_data_t temp = 1;
temp <<= to_flip;
*data ^= temp-1;
}
return 0;
}
It might literally be 100s of times faster. it makes 1/32th the loop iterations, and doesn't do all that mucking around with the "alu_bit" members, with all the calling and return overheads.