reversing bits in a truncated byte

**droseman** · 03-05-2010

Hi,

I have an embedded application where I read in 5 address lines (AD0-AD4) into 5 I/Os on my microcontroller, and as I seem to want to make life as hard for myself as possible I have connected AD4 to pin 0, AD3 to pin 1 etc.

This is how I obtain the address from my 16-bit wide port:

Code:

    PortC_data = GPIO_ReadInputData(GPIOC);
    psu_addr = (PortC_data & 0xFF) & 0x1F;

When I put in 00001, I get 0x10 instead of 0x01.

I tried doing a <<4 , but that didn't seem to work, and I have reached the end of my knowledge on bit manipulations.

Can anyone help me prevent a hardware redesign?

--dave

**vart** · 03-05-2010

the simplest way - use mapping table with 256 members, it will be the quickest way to reverse bits

**droseman** · 03-05-2010

Originally Posted by vart

the simplest way - use mapping table with 256 members, it will be the quickest way to reverse bits

Thats a good idea, but I think that it may be a bit large considering I have only 32K for my code, maybe the mapping table could be 32 members,as I am ultimately dealing with 5-bit numbers here.

--dave

**_Mike** · 03-05-2010

Originally Posted by droseman

Thats a good idea, but I think that it may be a bit large considering I have only 32K for my code, maybe the mapping table could be 32 members,as I am ultimately dealing with 5-bit numbers here.

I agree with the lookup table idea. Assuming a char is 8-bit on this cpu, then a 32 byte table (and possibly a 256 byte one as well) is likely to be smaller than any code you write to convert the number by bit manipulation.

**nonoob** · 03-05-2010

5 bits you say?

Code:

x << 4 & 16 | x << 2 & 8 | x & 4 | x >> 2 & 2 | x >> 4 & 1;

**iMalc** · 03-05-2010

A 256 byte hash table in a device with on 32K of RAM? Wow at that rate you'll have used it all up before you've written half of the program!

You want a variation on one of the tricks from here: Bit Twiddling Hacks

**laserlight** · 03-05-2010

Originally Posted by iMalc

A 256 byte hash table in a device with on 32K of RAM? Wow at that rate you'll have used it all up before you've written half of the program!

What about a 32 byte hash table, since as droseman himself noted that is enough?

**_Mike** · 03-05-2010

Would be interesting to know what cpu this is targetted at, to compare which method uses the least space.

**itCbitC** · 03-05-2010

Don't get how the circuit board works if you cross-wired the address buses, because some other leads too may be hooked up incorrectly. Btw is this a discreet or integrated circuit? And here's my 2c

Code:

    a = (PortC_data & 0x01)      << 4;
    b = (PortC_data >> 1 & 0x01) << 3;
    c = (PortC_data >> 2 & 0x01) << 2;
    d = (PortC_data >> 3 & 0x01) << 1;
    e = (PortC_data >> 4 & 0x01);
    psu_addr = a | b | c | d | e;

Whoa! didn't notice the one from nonoob - it takes the cake for using minimal resources.

**iMalc** · 03-06-2010

Sure, 32 bytes would probably be fine.

Assuming x is a byte, I'll offer the following alternative which has the same instruction count as what nonoob posted earlier. It's derived from the link I posted earlier:

Code:

x = ((x >> 1) & 0x05) | ((x & 0x15) << 1);
x = ((x >> 2) & 0x03) | ((x & 0x23) << 2);
x = (x << 1) | (x >> 7);

**droseman** · 03-06-2010

Thanks, useful responses as always on here.

I have to congratulate nonoob, that is a concise solution, but I will have to sit down and work the logic through before I understand it fully.

Don't get how the circuit board works if you cross-wired the address buses, because some other leads too may be hooked up incorrectly. Btw is this a discreet or integrated circuit?

Its my hardware design as well, so I can only hold my hands up to my PCB design boo-boo... The issue is that I have connected the MSB of my address lines to the LSB of my IO port and so on, otherwise my sample of code posted in my question would have worked perfectly.

The processor is an integrated STM32 ARM Cortex-M3 device

iMalc - What is the purpose of the hex constants in your solution?

--dave

**laserlight** · 03-06-2010

Originally Posted by droseman

I have to congratulate nonoob, that is a concise solution, but I will have to sit down and work the logic through before I understand it fully.

Basically, nonoob's idea is to shift each of the 5 bits into its final position, then bitwise and the result with just that corresponding set bit (to get rid of everything else). The five penultimate results are bitwise or'ed together to get the final result.

**droseman** · 03-06-2010

Originally Posted by laserlight

Basically, nonoob's idea is to shift each of the 5 bits into its final position, then bitwise and the result with just that corresponding set bit (to get rid of everything else). The five penultimate results are bitwise or'ed together to get the final result.

thanks for the explanation laserlight, that makes complete sense now.

--dave

**iMalc** · 03-06-2010

Originally Posted by droseman

iMalc - What is the purpose of the hex constants in your solution?

The idea of the solution is to swap bits in parallel. The original code it is based on first swaps the even bits with the odd bits, then it swaps every pair of bits with its adjacent pair, then it swaps every arrangement of four bits with its neighbour etc.
In order to isolate the odd bits, a bitmask of 0x55 can be used. However since we're only interested in 5 of the 8 bits, I reduced the mask to 0x05 and 0x15 respectively. Then to isloate pairs of bits, a mask of 0x33 can be used. But again I left holes in those bitmasks for to the upper 3 bits that we don't care about.

Here's a rundown on how the bits are reordered:
Start: 00012345 (number 1-5 indicates which bit of interest it was originally)
First line: 000000204 | 00103050 -> 00103254 (notice how the bits 2&3 and 4&5 are swapped)
Second line: 00000032 | 10005400 -> 10005432 (notice how the pairs 32 and 54 are swapped)
Last line: 00054320 | 00000001 -> 00054321

Unfortunately it takes as many instructions to reverse 5 bits as it does to reverse 8 because it works best with powers of two. This method works a lot better when reversing a longer sequence of bits though. When reversing 32 bits for example, it only takes 23 instructions, whereas the method nonoob posted takes 47.

**_Mike** · 03-06-2010

Comparing lookup tables vs bit manipulations in terms of resulting code size:

$ arm-elf-gcc --version
arm-elf-gcc (GCC) 4.1.1
$ arm-elf-as --version
GNU assembler 2.17

Code:

char convert(char x)
{
	return x << 4 & 16 | x << 2 & 8 | x & 4 | x >> 2 & 2 | x >> 4 & 1;
}

compiled and disassembled becomes

Code:

            convert
10 B5       PUSH    {R4,LR}
00 06       LSLS    R0, R0, #0x18
04 0E       LSRS    R4, R0, #0x18
A1 00       LSLS    R1, R4, #2
08 23       MOVS    R3, #8
19 40       ANDS    R1, R3
02 22       MOVS    R2, #2
83 0E       LSRS    R3, R0, #0x1A
13 40       ANDS    R3, R2
19 43       ORRS    R1, R3
10 22       MOVS    R2, #0x10
23 01       LSLS    R3, R4, #4
13 40       ANDS    R3, R2
04 22       MOVS    R2, #4
14 40       ANDS    R4, R2
C0 00       LSLS    R0, R0, #3
23 43       ORRS    R3, R4
C0 0F       LSRS    R0, R0, #0x1F
03 43       ORRS    R3, R0
19 43       ORRS    R1, R3
08 1C       MOVS    R0, R1
10 BD       POP     {R4,PC}
            ; End of function convert

22 instructions, 44 bytes

Code:

char lookup_table[] = {
	0x00, 0x10, 0x08, 0x18, 0x04, 0x14, 0x0C, 0x1C, 
	0x02, 0x12, 0x0A, 0x1A, 0x06, 0x16, 0x0E, 0x1E, 
	0x01, 0x11, 0x09, 0x19, 0x05, 0x15, 0x0D, 0x1D,
	0x03, 0x13, 0x0B, 0x1B, 0x07, 0x17, 0x0F, 0x1F
};

char convert(char x)
{
	return lookup_table[x];
}

compiled and disassembled becomes

Code:

            convert
02 4B       LDR     R3, =lookup_table
00 06       LSLS    R0, R0, #0x18
00 0E       LSRS    R0, R0, #0x18
18 5C       LDRB    R0, [R3,R0]
70 47       BX      LR
            ; End of function convert

5 instructions, 10 bytes. Plus 32 bytes for the table, for a total of 42 bytes.

compiler switches used were

$ arm-elf-gcc -Os -Xassembler -mcpu=cortex-m3 -mthumb -c test.c -o test.o

The size difference of 2 bytes is next to nothing. But the lookup table version might be faster, depending on memory access times to fetch the table entry and if it's cached or not, because it has much fewer instructions to execute.
And the lookup function is much more suited for inlining than the bit-shifting one (in terms of resulting code size).

Do note that I might be wrong about the optimal compiler switches used because I don't have any experience developing for the cortex cpus, and this might shift the resulting size in favour of one method or the other.

Thread: reversing bits in a truncated byte

Thread Tools

Search Thread

Display

reversing bits in a truncated byte

Similar Threads

SDLKey to ASCII without unicode support?

brace-enclosed error

About aes

byte is equal 8 bits???

error: identifier "byte" is undefined.