Optimizing hot section of code.
Greetings everybody, I'm trying to improve the performance of my program and according to my profiling tests, the function below alone accounts for the majority of the execution time, it is the biggest bottleneck by far. If y'all have a little time to spare, I'd appreciate some ideas on how to optimize it.
The function compresses a 64 bit block of data into 32 bits, by using hard-coded values in the substitution boxes. The 8 boxes are defined in a static 3d array in a header file. I tried making them 8 separate arrays and eliminating the loop but that didn't improve anything, the compiler is probably unrolling the loop on its own.
Originally I had a variable to hold the six bit segment but after testing, it turns out it is faster to compute the value every time, I don't know why. Maybe the compiler ran out of registers and the CPU has to issue a MOV instruction into memory on every iteration. /shrug
Code:
#define SEXTET 6
#define QUARTET 4
#define SEXTET_MASK 0x3f
#define SEXTET_SEGMENT ((block >> blck_bit_pos) & SEXTET_MASK) //get six bits from block
#define ROW ((SEXTET_SEGMENT >> 5) << 1) | (SEXTET_SEGMENT & 0x1) //Row index is formed by joining the first and last bit of the sextet.
#define COLUMN (SEXTET_SEGMENT >> 1) & 0xf //COLUMN index is formed by the inner four bits of the sextet.
static inline uint32_t compress(uint64_t block)
{
unsigned int byte;
unsigned int blck_bit_pos;
unsigned int cmpsd_bit_pos;
register uint32_t compressed;
blck_bit_pos = (sizeof(block) * BITS_PER_BYTE) - SEXTET;
cmpsd_bit_pos = (sizeof(compressed) * BITS_PER_BYTE) - QUARTET;
byte = 0;
compressed = 0;
while(byte < BITS_PER_BYTE)
{
compressed |= (uint32_t)g_sboxes[byte++][ROW][COLUMN] << cmpsd_bit_pos;
blck_bit_pos -= SEXTET;
cmpsd_bit_pos -= QUARTET;
}
return (compressed);
}