Bits and bytes- Please help.

>> The MinGW port of gcc 3.4.5 refused to compile your original code, reporting that the initialiser of the static variable was not constant.

That's silly. I wonder why it thinks it should be (3.4.2 doesn't seem to care!)?

Just to clarify something a few pages back; Yes unsigned chars is absolutely what I would use. I was too busy making a point about the memory usage difference that I neglected to see I had missed off the "unsigned".
Of course it doesn't really that much anyway.

Sebastiani has the solution I would use.

Quote:

Originally Posted by Sebastiani

That's silly. I wonder why it thinks it should be (3.4.2 doesn't seem to care!)?

On a hunch: did you compile as C++ or as C99? I tried compiling as C++ instead, and there was no errors, but this is the C programming forum.

Hey Sebastiani...

In your timing code, can you please add another function to try out:

Similar in mathematical result as

Code:

int get_first_available_channel_ceil_log( unsigned channels ) { int index = ( int )ceil( log2( channels + 1 ) ); return index ? 33 - index : 0; }

Here is my contest entry to compute ceil ( log2 ( n ) ) faster than using the library math functions.

Code:

int fast_index(unsigned channels) { double c = (double)channels; return channels ? 1055 - (int)(*(long long *)&c >> 52): 0; }

Can you please run it and post the timing results for all the algorithms?

>> On a hunch: did you compile as C++ or as C99? I tried compiling as C++ instead, and there was no errors, but this is the C programming forum.

You're right, I did compile it with a C++ compiler. But I'm still confused as to why this is considered an error (and too lazy at the moment to look at the standard).

>> Here is my contest entry to compute ceil ( log2 ( n ) ) faster than using the library math functions.

It's not really a contest, I was just trying to get an idea of how the algorithms compared. Unfortunately, I don't think that the code you posted is portable for a couple of reasons. For one, the long long type is not guaranteed to be the same size as a double, and besides that, I don't believe the standard even mandates any particular binary representation for doubles. Having said that, it will probably work on *most* implementations. At any rate, I wil take a look at it and see how the function performs, in general.

EDIT:
It looks like the bitwise ceil/log solution is not far behind the lookup table approach (slightly less than twice in running time), so it may be a good candidate for memory-strapped applications such as imbedded systems. It's possible that it could be optimized further to run even faster, but I'll leave it up to you math geniuses to work that out. :p Anyway, here's an updated (C99 compiant) version of the test:

Code:

#include <stdio.h> #include <stdlib.h> #include <math.h> #include <time.h> #define array_size( array ) ( sizeof( array ) / sizeof( array[ 0 ] ) ) void initialize_table( unsigned char table[ 256 ] ) { int index, shift, min, max; for( shift = min = 1; shift < 9; shift++ ) { max = 1 << shift; for( index = min; index < max; index++ ) { table[ index ] = 9 - shift; } min = max; } } int get_first_available_channel_table_lookup( unsigned channels ) { static unsigned char table[ 256 ]; static int initialized = 0; if( !initialized ) { initialize_table( table ); initialized = 1; } return channels == 0 ? 0 : channels < 0x100 ? table[ channels & 0xff ] + 24 : channels < 0x10000 ? table[ ( channels >> 8 ) & 0xff ] + 16 : channels < 0x1000000 ? table[ ( channels >> 16 ) & 0xff ] + 8 : table[ ( channels >> 24 ) & 0xff ]; } int get_first_available_channel_ceil_log( unsigned channels ) { int index = ( int )ceil( log2( channels + 1 ) ); return index ? 33 - index : 0; } int get_first_available_channel_bit_by_bit( unsigned channels ) { unsigned index, mask; for( index = 1, mask = 0x80000000; mask; mask >>= 1, index++ ) { if( channels & mask ) return index; } return 0; } int get_first_available_channel_bitwise_ceil_log( unsigned channels ) { double c = ( double )channels; return channels ? 1055 - ( int )( *( long long * )&c >> 52 ) : 0; } typedef int ( * get_first_available_channel_function )( unsigned ); void test( unsigned* values, size_t length, const char* name, get_first_available_channel_function function, size_t repetitions ) { size_t iterations = repetitions * length; unsigned * v, * e = values + length; clock_t elapsed, stop, start = clock( ); while( repetitions-- ) { v = values; while( v != e ) { function( *v++ ); } } stop = clock( ); elapsed = ( clock_t )( ( ( double ) ( stop - start ) / ( double )CLK_TCK ) * 1000 ); printf( "%s : %d iterations performed in %d milliseconds\n", name, iterations, ( size_t )elapsed ); } int main( void ) { int index; unsigned data[ 256 ]; size_t repetitions = 30000, tests = 10; struct { char* name; get_first_available_channel_function function; } functions[ ] = { { "bitwise_ceil_log", get_first_available_channel_bitwise_ceil_log }, { "ceil_log", get_first_available_channel_ceil_log }, { "table_lookup", get_first_available_channel_table_lookup }, { "bit_by_bit", get_first_available_channel_bit_by_bit } }; srand( time( 0 ) ); while( tests-- ) { for( index = 0; index < array_size( data ); ++index ) { data[ index ] = rand( ); } for( index = 0; index < array_size( functions ); ++index ) { test( data, array_size( data ), functions[ index ].name, functions[ index ].function, repetitions ); } } return 0; }

The one with the 1055 constant in it could be made portable by using frexp:

Code:

int get_first_available_channel_frexp( unsigned channels ) { int exponent; frexp(static_cast<double>(channels), &exponent); return channels ? sizeof(channels) * CHAR_BIT + 1 - exponent : 0; }

Can you add GCC's internal implementation for comparison?

Code:

int get_first_available_channel_gcc_builtin( unsigned channels ) { if (channels) return __builtin_clz (channels); return 0; }

I had to do this for a project a while ago, and tried a few algorithms. None could beat the GCC builtin one. I was working with 64-bit unsigned, though.

>> I had to do this for a project a while ago, and tried a few algorithms. None could beat the GCC builtin one. I was working with 64-bit unsigned, though.

You're right, this turned out to be the fastest so far. Any idea how it's implemented?

Here's an updated version of the test (with verification):

Code:

#include <stdio.h> #include <stdlib.h> #include <math.h> #include <time.h> #include <limits.h> #define array_size( array ) ( sizeof( array ) / sizeof( array[ 0 ] ) ) void initialize_table( unsigned char table[ 256 ] ) { int index, shift, min, max; for( shift = min = 1; shift < 9; ++shift ) { max = 1 << shift; for( index = min; index < max; ++index ) { table[ index ] = 9 - shift; } min = max; } } int get_first_available_channel_table_lookup( unsigned channels ) { static unsigned char table[ 256 ]; static int initialized = 0; if( !initialized ) { initialize_table( table ); initialized = 1; } return channels == 0 ? 0 : channels < 0x100 ? table[ channels & 0xff ] + 24 : channels < 0x10000 ? table[ ( channels >> 8 ) & 0xff ] + 16 : channels < 0x1000000 ? table[ ( channels >> 16 ) & 0xff ] + 8 : table[ ( channels >> 24 ) & 0xff ]; } int get_first_available_channel_ceil_log( unsigned channels ) { int index = ( int )ceil( log2( channels + 1 ) ); return index ? 33 - index : 0; } int get_first_available_channel_bit_by_bit( unsigned channels ) { unsigned index, mask; for( index = 1, mask = 0x80000000; mask; mask >>= 1, ++index ) { if( channels & mask ) return index; } return 0; } int get_first_available_channel_bitwise_ceil_log( unsigned channels ) { double c = ( double )channels; return channels ? 1055 - ( int )( *( long long * )&c >> 52 ) : 0; } int get_first_available_channel_frexp( unsigned channels ) { static int constant = sizeof( channels ) * CHAR_BIT + 1; int exponent; frexp( ( double )channels, &exponent ); return channels ? constant - exponent : 0; } int get_first_available_channel_gcc_builtin( unsigned channels ) { if( channels ) return __builtin_clz( channels ) + 1; return 0; } typedef int ( * get_first_available_channel_function )( unsigned ); struct entry { char* name; get_first_available_channel_function function; int passed; }; void test( unsigned* values, size_t length, struct entry* entry, size_t repetitions ) { int result, control; size_t iterations = repetitions * length; unsigned * v, * e = values + length; for( v = values; v != e; ++v ) { result = entry->function( *v ); control = get_first_available_channel_table_lookup( *v ); if( result != control ) { printf( "Error: %s reported channel in 0x%x to be %d (should be %d)\n", entry->name, *v, result, control ); entry->passed = 0; } } if( !entry->passed ) { printf( "%s : N/A (invalid)\n" ); return; } clock_t elapsed, stop, start = clock( ); while( repetitions-- ) { v = values; while( v != e ) { entry->function( *v++ ); } } stop = clock( ); elapsed = ( clock_t )( ( ( double ) ( stop - start ) / ( double )CLK_TCK ) * 1000 ); printf( "%s : %d iterations performed in %d milliseconds\n", entry->name, iterations, ( size_t )elapsed ); } int main( void ) { int index; unsigned data[ 256 ]; size_t repetitions = 30000, tests = 10; struct entry entries[ ] = { { "frexp", get_first_available_channel_frexp, 1 }, { "gcc_builtin", get_first_available_channel_gcc_builtin, 1 }, { "bitwise_ceil_log", get_first_available_channel_bitwise_ceil_log, 1 }, { "ceil_log", get_first_available_channel_ceil_log, 1 }, { "table_lookup", get_first_available_channel_table_lookup, 1 }, { "bit_by_bit", get_first_available_channel_bit_by_bit, 1 } }; srand( time( 0 ) ); while( tests-- ) { for( index = 0; index < array_size( data ); ++index ) { data[ index ] = rand( ); } for( index = 0; index < array_size( entries ); ++index ) { test( data, array_size( data ), &entries[ index ], repetitions ); } } return 0; }

I ran the test myself, and with -O3 (gcc -O3 -std=c99 test.c -lm),

Quote:

...
Error: bitwise_ceil_log reported channel in 0xe002f85 to be 1055 (should be 5)
Error: bitwise_ceil_log reported channel in 0x68dce8fc to be 1055 (should be 2)
Error: bitwise_ceil_log reported channel in 0x3997ff43 to be 1055 (should be 3)
Error: bitwise_ceil_log reported channel in 0x23c952b to be 1055 (should be 7)
Error: bitwise_ceil_log reported channel in 0xecf79ff to be 1055 (should be 5)
Error: bitwise_ceil_log reported channel in 0x401317cd to be 1055 (should be 2)
Error: bitwise_ceil_log reported channel in 0x141ac60 to be 1055 (should be 8)

If I take it out

Quote:

rexp : 7680000 iterations performed in 200 milliseconds
gcc_builtin : 7680000 iterations performed in 40 milliseconds
ceil_log : 7680000 iterations performed in 830 milliseconds
table_lookup : 7680000 iterations performed in 70 milliseconds
bit_by_bit : 7680000 iterations performed in 120 milliseconds

This is GCC 4.3.3 on 32-bit x86 Linux.

Quote:

Any idea how it's implemented?

Hardware implementation.

Code:

bsrl %edx, %eax

I don't understand why bitwise_ceil_log crapped out. Perhaps there is no "long long" (__int64 or some variant) in your compiler - although you should have gotten hideous syntax errors to that effect. My guess it that the >> 53 shift gives a zero every time for some reason.

No, wait. It's more likely an endian thing where the IEEE 754 floating-point format can not be directly overlaid with the internal representation of a 64-bit integer.

Quote:

Perhaps there is no "long long" (__int64 or some variant) in your compiler

It most certainly does.

I have no idea, either. It only doesn't work if I turn on optimization. Works fine with O0. Meaning it's probably something to do with relying on some kind of undefined behaviour (or GCC bug).