efficient way to calc log base 2 of 64-bit unsigned int

This is a discussion on efficient way to calc log base 2 of 64-bit unsigned int within the C++ Programming forums, part of the General Programming Boards category; Hi, Is there a more efficient way to calculate the log base 2 of a 64-bit unsigned integer than looping ...

  1. #1
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183

    efficient way to calc log base 2 of 64-bit unsigned int

    Hi,
    Is there a more efficient way to calculate the log base 2 of a 64-bit unsigned integer than looping through the bits? I couldn't find an algorithm on the internet that works on 64-bit numbers.

    PS. assume that the input is a power of 2 (exactly one bit is set)

    Thank you very much

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,834
    Code:
    #include<stdio.h>
    #include<stdlib.h>
    
    int findBit ( unsigned long val ) {
        int result = 0;
        if ( val >= 0x10000 ) {
            result += 16;
            val >>= 16;
        }
        if ( val >= 0x100 ) {
            result += 8;
            val >>= 8;
        }
        if ( val >= 0x10 ) {
            result += 4;
            val >>= 4;
        }
        if ( val >= 0x4 ) {
            result += 2;
            val >>= 2;
        }
        if ( val >= 0x2 ) {
            result += 1;
            val >>= 1;
        }
        return result + val;
    }
    
    int main ( ) {
        unsigned long tests[] = {
            0, 0x1, 0x100000, 0x400000, 0x80000000
        };
        int i;
        for ( i = 0 ; i < sizeof tests / sizeof *tests ; i++ ) {
            printf("&#37;08lx = %d\n", tests[i], findBit(tests[i]) );
        }
    
        return 0;
    }
    
    $ gcc foo.c
    $ ./a.exe
    00000000 = 0
    00000001 = 1
    00100000 = 21
    00400000 = 23
    80000000 = 32
    64 bits is just one more test.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  3. #3
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    thank you for the code, took me a while to understand, but I think I do now.

  4. #4
    Registered User
    Join Date
    Sep 2006
    Posts
    835
    The "val >=" and "result +=" lines can be implemented using bitwise & and |=, resp.

  5. #5
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    thank you for the suggestion. I changed the code accordingly:
    Code:
    unsigned int findBit ( Uint64 bb ) {
    	unsigned int result = 0;
    	
    	if ( bb & ~0x1FFFFFFFFLL) {
    		result |= 32;
    		bb >>= 32;
    	}	
    	if ( bb & ~0x1FFFF ) {
    		result |= 16;
    		bb >>= 16;
    	}
    	if ( bb & ~0x1FF ) {
    		result |= 8;
    		bb >>= 8;
    	}
    	if ( bb & ~0x1F ) {
    		result |= 4;
    		bb >>= 4;
    	}
    	if ( bb & ~0x7 ) {
    		result |= 2;
    		bb >>= 2;
    	}
    	if ( bb & ~0x3 ) {
    		result |= 1;
    		bb >>= 1;
    	}
    	return (result + bb);
    }
    however, to my surprise, through benchmarking (a simple loop that executes this function many times), it appears that the improvement in efficiency is unnoticeable or even non-existant. Is this what is supposed to happen? or has my compiler (gcc 4.1) done some magic to the old code?

    Thank you

  6. #6
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Portugal
    Posts
    7,581
    bitwise operations don't experience the performance improvement of old days.

    Always a nice thing to do, I expect. Especially when there are plans to turn new code into old code or use new code in old systems. However, as far as modern compilers (and processors?)are concerned, explicit bitwise operations in source code don't have too much of an impact any more under most common situations.
    The programmer’s wife tells him: “Run to the store and pick up a loaf of bread. If they have eggs, get a dozen.”
    The programmer comes home with 12 loaves of bread.


    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  7. #7
    Registered User
    Join Date
    Sep 2006
    Posts
    835
    I'm surprised - I wasn't sure about the "val >=", but I was pretty sure that the compiler wasn't smart enough to realize that each of the "result +=" could be replaced with |= (since it requires seeing that the different increments have no bits in common, and that the initial value of result is 0) and that |= would be faster than +=.

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by robatino View Post
    I'm surprised - I wasn't sure about the "val >=", but I was pretty sure that the compiler wasn't smart enough to realize that each of the "result +=" could be replaced with |= (since it requires seeing that the different increments have no bits in common, and that the initial value of result is 0) and that |= would be faster than +=.
    The compiler probably doesn't replace += with |=, but += and |= are the same number of clocks either way on a modern x86-based processor - one cycle latency for the actual OR operation, and then whatever additional time to access the arguments (in this example no time at all, since on side is constants and most likely result is in a register - or it's cached so not much of a latency there either).

    Where it may make a difference is if you're OR-ing to data that is bigger than a register, say a 32-bit value on a 16-bit x86 processor. But that was a while back. I'm pretty sure that OR and ADD are the same number of cycles even on a 486.

    --
    Mats

  9. #9
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,834
    If you wanted to go "all out", and you're using an x86, there's always the "BSR—Bit Scan Reverse" instruction to play with.

    > it appears that the improvement in efficiency is unnoticeable or even non-existant.
    > Is this what is supposed to happen?
    Why would you imagine that there would be some vast difference in performance?


    Personally, I prefer mine, because it's more intuitively obvious (well I think so) as to what is going on.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  10. #10
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    thank you for all your comments

    I was under the (wrong) impression that bitwise operations are faster than add.

    thank you for clarifying it for me

    in this case, I would prefer Salem's original version, too, for the readability.

    by the way, i am using this function in the inner loop of a chess playing program, that's why I wanted to optimize it

  11. #11
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    Where it may make a difference is if you're OR-ing to data that is bigger than a register, say a 32-bit value on a 16-bit x86 processor. But that was a while back. I'm pretty sure that OR and ADD are the same number of cycles even on a 486.
    so it would matter if I am OR-ing to 64-bit data on a 32-bit CPU? what kind of difference would that make?

    Thank you

  12. #12
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by cyberfish View Post
    so it would matter if I am OR-ing to 64-bit data on a 32-bit CPU? what kind of difference would that make?

    Thank you
    Whilst testing this out, I've found that Visual Studio VC7 doesn't do this very well [because it touches the second half of the 64-bit word anyways]- but when I tried to do it in assembler, I found that the result isn't noticably better - most likely due to bad branch prediction. If I use a constant instead of a variable, then the assembler version is much faster than the C-version with variable input, but the compiler realizes that the input is constant and optimizes the C-version out of the loop, so it takes "zero time" to run the loop!

    --
    Mats

  13. #13
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    I see... thank you

  14. #14
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,274
    Quote Originally Posted by cyberfish View Post
    Hi,
    Is there a more efficient way to calculate the log base 2 of a 64-bit unsigned integer than looping through the bits? I couldn't find an algorithm on the internet that works on 64-bit numbers.
    My advice is to keep the board reading code as simple and understandable as possible. Look for performance gains by enhancing your evaluation function, using a transposition table, and improving move ordering for alpha-beta search. You can get very large gains this way, whereas optimizing the board access function will lead to only a linear speedup. You're not going to make your code twice as fast by fiddling with this function.

    One of the NASA engineers who designed the control code for the antenna beam on the space shuttle once told me, "Don't bother making something faster unless you can make it 4 times faster."

  15. #15
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,183
    My advice is to keep the board reading code as simple and understandable as possible. Look for performance gains by enhancing your evaluation function, using a transposition table, and improving move ordering for alpha-beta search. You can get very large gains this way, whereas optimizing the board access function will lead to only a linear speedup. You're not going to make your code twice as fast by fiddling with this function.
    Thank you for your advice, it's much appreciated. I am still on my board handling and move generation code, not yet onto the AI part, but I am already planning to implement various optimizations such as transposition tables as I have one month to work on the AI if needed (I am still a high school student).

    One of the NASA engineers who designed the control code for the antenna beam on the space shuttle once told me, "Don't bother making something faster unless you can make it 4 times faster."
    well, the change from my original looping algorithm to this algorithm is a change from linear search to binary search (if I understand this algorithm correctly) and the speed up is well over 4 times =)

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Code review
    By Elysia in forum C++ Programming
    Replies: 71
    Last Post: 05-13-2008, 10:42 PM
  2. Replies: 3
    Last Post: 05-13-2007, 09:55 AM
  3. Need help understanding info in a header file
    By hicpics in forum C Programming
    Replies: 8
    Last Post: 12-02-2005, 12:36 PM
  4. Can't Find Conio.h?
    By drdroid in forum C++ Programming
    Replies: 27
    Last Post: 09-26-2002, 05:24 AM
  5. A Simple (?) Problem
    By Unregistered in forum C++ Programming
    Replies: 8
    Last Post: 10-12-2001, 05:28 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21