Comparing lookup tables vs bit manipulations in terms of resulting code size:
$ arm-elf-gcc --version
arm-elf-gcc (GCC) 4.1.1
$ arm-elf-as --version
GNU assembler 2.17
Code:
char convert(char x)
{
return x << 4 & 16 | x << 2 & 8 | x & 4 | x >> 2 & 2 | x >> 4 & 1;
}
compiled and disassembled becomes
Code:
convert
10 B5 PUSH {R4,LR}
00 06 LSLS R0, R0, #0x18
04 0E LSRS R4, R0, #0x18
A1 00 LSLS R1, R4, #2
08 23 MOVS R3, #8
19 40 ANDS R1, R3
02 22 MOVS R2, #2
83 0E LSRS R3, R0, #0x1A
13 40 ANDS R3, R2
19 43 ORRS R1, R3
10 22 MOVS R2, #0x10
23 01 LSLS R3, R4, #4
13 40 ANDS R3, R2
04 22 MOVS R2, #4
14 40 ANDS R4, R2
C0 00 LSLS R0, R0, #3
23 43 ORRS R3, R4
C0 0F LSRS R0, R0, #0x1F
03 43 ORRS R3, R0
19 43 ORRS R1, R3
08 1C MOVS R0, R1
10 BD POP {R4,PC}
; End of function convert
22 instructions, 44 bytes
Code:
char lookup_table[] = {
0x00, 0x10, 0x08, 0x18, 0x04, 0x14, 0x0C, 0x1C,
0x02, 0x12, 0x0A, 0x1A, 0x06, 0x16, 0x0E, 0x1E,
0x01, 0x11, 0x09, 0x19, 0x05, 0x15, 0x0D, 0x1D,
0x03, 0x13, 0x0B, 0x1B, 0x07, 0x17, 0x0F, 0x1F
};
char convert(char x)
{
return lookup_table[x];
}
compiled and disassembled becomes
Code:
convert
02 4B LDR R3, =lookup_table
00 06 LSLS R0, R0, #0x18
00 0E LSRS R0, R0, #0x18
18 5C LDRB R0, [R3,R0]
70 47 BX LR
; End of function convert
5 instructions, 10 bytes. Plus 32 bytes for the table, for a total of 42 bytes.
compiler switches used were
$ arm-elf-gcc -Os -Xassembler -mcpu=cortex-m3 -mthumb -c test.c -o test.o
The size difference of 2 bytes is next to nothing. But the lookup table version might be faster, depending on memory access times to fetch the table entry and if it's cached or not, because it has much fewer instructions to execute.
And the lookup function is much more suited for inlining than the bit-shifting one (in terms of resulting code size).
Do note that I might be wrong about the optimal compiler switches used because I don't have any experience developing for the cortex cpus, and this might shift the resulting size in favour of one method or the other.