Thread: reversing bits in a truncated byte

  1. #1
    Embedded in C...
    Join Date
    Sep 2008
    Location
    Basingstoke, Hampshire
    Posts
    83

    reversing bits in a truncated byte

    Hi,

    I have an embedded application where I read in 5 address lines (AD0-AD4) into 5 I/Os on my microcontroller, and as I seem to want to make life as hard for myself as possible I have connected AD4 to pin 0, AD3 to pin 1 etc.

    This is how I obtain the address from my 16-bit wide port:

    Code:
        PortC_data = GPIO_ReadInputData(GPIOC);
        psu_addr = (PortC_data & 0xFF) & 0x1F;
    When I put in 00001, I get 0x10 instead of 0x01.

    I tried doing a <<4 , but that didn't seem to work, and I have reached the end of my knowledge on bit manipulations.

    Can anyone help me prevent a hardware redesign?

    --dave

  2. #2
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    the simplest way - use mapping table with 256 members, it will be the quickest way to reverse bits
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  3. #3
    Embedded in C...
    Join Date
    Sep 2008
    Location
    Basingstoke, Hampshire
    Posts
    83
    Quote Originally Posted by vart View Post
    the simplest way - use mapping table with 256 members, it will be the quickest way to reverse bits
    Thats a good idea, but I think that it may be a bit large considering I have only 32K for my code, maybe the mapping table could be 32 members,as I am ultimately dealing with 5-bit numbers here.

    --dave

  4. #4
    Registered User
    Join Date
    Jan 2010
    Posts
    412
    Quote Originally Posted by droseman View Post
    Thats a good idea, but I think that it may be a bit large considering I have only 32K for my code, maybe the mapping table could be 32 members,as I am ultimately dealing with 5-bit numbers here.
    I agree with the lookup table idea. Assuming a char is 8-bit on this cpu, then a 32 byte table (and possibly a 256 byte one as well) is likely to be smaller than any code you write to convert the number by bit manipulation.

  5. #5
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    5 bits you say?
    Code:
    x << 4 & 16 | x << 2 & 8 | x & 4 | x >> 2 & 2 | x >> 4 & 1;

  6. #6
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    A 256 byte hash table in a device with on 32K of RAM? Wow at that rate you'll have used it all up before you've written half of the program!

    You want a variation on one of the tricks from here: Bit Twiddling Hacks
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by iMalc
    A 256 byte hash table in a device with on 32K of RAM? Wow at that rate you'll have used it all up before you've written half of the program!
    What about a 32 byte hash table, since as droseman himself noted that is enough?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User
    Join Date
    Jan 2010
    Posts
    412
    Would be interesting to know what cpu this is targetted at, to compare which method uses the least space.

  9. #9
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Don't get how the circuit board works if you cross-wired the address buses, because some other leads too may be hooked up incorrectly. Btw is this a discreet or integrated circuit? And here's my 2c
    Code:
        a = (PortC_data & 0x01)      << 4;
        b = (PortC_data >> 1 & 0x01) << 3;
        c = (PortC_data >> 2 & 0x01) << 2;
        d = (PortC_data >> 3 & 0x01) << 1;
        e = (PortC_data >> 4 & 0x01);
        psu_addr = a | b | c | d | e;
    Whoa! didn't notice the one from nonoob - it takes the cake for using minimal resources.
    Last edited by itCbitC; 03-05-2010 at 04:49 PM.

  10. #10
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Sure, 32 bytes would probably be fine.

    Assuming x is a byte, I'll offer the following alternative which has the same instruction count as what nonoob posted earlier. It's derived from the link I posted earlier:
    Code:
    x = ((x >> 1) & 0x05) | ((x & 0x15) << 1);
    x = ((x >> 2) & 0x03) | ((x & 0x23) << 2);
    x = (x << 1) | (x >> 7);
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  11. #11
    Embedded in C...
    Join Date
    Sep 2008
    Location
    Basingstoke, Hampshire
    Posts
    83
    Thanks, useful responses as always on here.

    I have to congratulate nonoob, that is a concise solution, but I will have to sit down and work the logic through before I understand it fully.

    Don't get how the circuit board works if you cross-wired the address buses, because some other leads too may be hooked up incorrectly. Btw is this a discreet or integrated circuit?
    Its my hardware design as well, so I can only hold my hands up to my PCB design boo-boo... The issue is that I have connected the MSB of my address lines to the LSB of my IO port and so on, otherwise my sample of code posted in my question would have worked perfectly.

    The processor is an integrated STM32 ARM Cortex-M3 device

    iMalc - What is the purpose of the hex constants in your solution?

    --dave

  12. #12
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by droseman
    I have to congratulate nonoob, that is a concise solution, but I will have to sit down and work the logic through before I understand it fully.
    Basically, nonoob's idea is to shift each of the 5 bits into its final position, then bitwise and the result with just that corresponding set bit (to get rid of everything else). The five penultimate results are bitwise or'ed together to get the final result.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  13. #13
    Embedded in C...
    Join Date
    Sep 2008
    Location
    Basingstoke, Hampshire
    Posts
    83
    Quote Originally Posted by laserlight View Post
    Basically, nonoob's idea is to shift each of the 5 bits into its final position, then bitwise and the result with just that corresponding set bit (to get rid of everything else). The five penultimate results are bitwise or'ed together to get the final result.
    thanks for the explanation laserlight, that makes complete sense now.

    --dave

  14. #14
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by droseman View Post
    iMalc - What is the purpose of the hex constants in your solution?
    The idea of the solution is to swap bits in parallel. The original code it is based on first swaps the even bits with the odd bits, then it swaps every pair of bits with its adjacent pair, then it swaps every arrangement of four bits with its neighbour etc.
    In order to isolate the odd bits, a bitmask of 0x55 can be used. However since we're only interested in 5 of the 8 bits, I reduced the mask to 0x05 and 0x15 respectively. Then to isloate pairs of bits, a mask of 0x33 can be used. But again I left holes in those bitmasks for to the upper 3 bits that we don't care about.

    Here's a rundown on how the bits are reordered:
    Start: 00012345 (number 1-5 indicates which bit of interest it was originally)
    First line: 000000204 | 00103050 -> 00103254 (notice how the bits 2&3 and 4&5 are swapped)
    Second line: 00000032 | 10005400 -> 10005432 (notice how the pairs 32 and 54 are swapped)
    Last line: 00054320 | 00000001 -> 00054321

    Unfortunately it takes as many instructions to reverse 5 bits as it does to reverse 8 because it works best with powers of two. This method works a lot better when reversing a longer sequence of bits though. When reversing 32 bits for example, it only takes 23 instructions, whereas the method nonoob posted takes 47.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  15. #15
    Registered User
    Join Date
    Jan 2010
    Posts
    412
    Comparing lookup tables vs bit manipulations in terms of resulting code size:

    $ arm-elf-gcc --version
    arm-elf-gcc (GCC) 4.1.1
    $ arm-elf-as --version
    GNU assembler 2.17
    Code:
    char convert(char x)
    {
    	return x << 4 & 16 | x << 2 & 8 | x & 4 | x >> 2 & 2 | x >> 4 & 1;
    }
    compiled and disassembled becomes
    Code:
                convert
    10 B5       PUSH    {R4,LR}
    00 06       LSLS    R0, R0, #0x18
    04 0E       LSRS    R4, R0, #0x18
    A1 00       LSLS    R1, R4, #2
    08 23       MOVS    R3, #8
    19 40       ANDS    R1, R3
    02 22       MOVS    R2, #2
    83 0E       LSRS    R3, R0, #0x1A
    13 40       ANDS    R3, R2
    19 43       ORRS    R1, R3
    10 22       MOVS    R2, #0x10
    23 01       LSLS    R3, R4, #4
    13 40       ANDS    R3, R2
    04 22       MOVS    R2, #4
    14 40       ANDS    R4, R2
    C0 00       LSLS    R0, R0, #3
    23 43       ORRS    R3, R4
    C0 0F       LSRS    R0, R0, #0x1F
    03 43       ORRS    R3, R0
    19 43       ORRS    R1, R3
    08 1C       MOVS    R0, R1
    10 BD       POP     {R4,PC}
                ; End of function convert
    22 instructions, 44 bytes

    Code:
    char lookup_table[] = {
    	0x00, 0x10, 0x08, 0x18, 0x04, 0x14, 0x0C, 0x1C, 
    	0x02, 0x12, 0x0A, 0x1A, 0x06, 0x16, 0x0E, 0x1E, 
    	0x01, 0x11, 0x09, 0x19, 0x05, 0x15, 0x0D, 0x1D,
    	0x03, 0x13, 0x0B, 0x1B, 0x07, 0x17, 0x0F, 0x1F
    };
    
    char convert(char x)
    {
    	return lookup_table[x];
    }
    compiled and disassembled becomes
    Code:
                convert
    02 4B       LDR     R3, =lookup_table
    00 06       LSLS    R0, R0, #0x18
    00 0E       LSRS    R0, R0, #0x18
    18 5C       LDRB    R0, [R3,R0]
    70 47       BX      LR
                ; End of function convert
    5 instructions, 10 bytes. Plus 32 bytes for the table, for a total of 42 bytes.

    compiler switches used were
    $ arm-elf-gcc -Os -Xassembler -mcpu=cortex-m3 -mthumb -c test.c -o test.o
    The size difference of 2 bytes is next to nothing. But the lookup table version might be faster, depending on memory access times to fetch the table entry and if it's cached or not, because it has much fewer instructions to execute.
    And the lookup function is much more suited for inlining than the bit-shifting one (in terms of resulting code size).

    Do note that I might be wrong about the optimal compiler switches used because I don't have any experience developing for the cortex cpus, and this might shift the resulting size in favour of one method or the other.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. SDLKey to ASCII without unicode support?
    By zacs7 in forum Game Programming
    Replies: 6
    Last Post: 10-07-2007, 03:03 AM
  2. brace-enclosed error
    By jdc18 in forum C++ Programming
    Replies: 53
    Last Post: 05-03-2007, 05:49 PM
  3. About aes
    By gumit in forum C Programming
    Replies: 13
    Last Post: 10-24-2006, 03:42 PM
  4. byte is equal 8 bits???
    By Micko in forum C Programming
    Replies: 3
    Last Post: 10-15-2004, 10:12 AM
  5. error: identifier "byte" is undefined.
    By Hulag in forum C++ Programming
    Replies: 4
    Last Post: 12-10-2003, 05:46 PM