# Thread: Get Exponent of Float

1. ## Get Exponent of A Float

Hello,
Is there a way to get the exponent of a number without using shift and mask operators.

I can remove the mantissa using division which is equivalent to shifting,
but I am stuck at masking.

Code:
```static const int32_t getExponent(uint32_t binary) {

//Shifting
printf("Exponent = %d\n", ((binary >> 23) & 255) - 127);
printf("Shift = %d\n", binary >> 23);

// binary = binary / 0b100000000000000000000000
binary = binary / 8388608;
printf("Shift by Division = %d\n", binary);

int32_t i = 8, e = 1;
while (--i) {
e *= 2;
if (binary % 2 == 1) {
++e;
}
binary /= 2;
}
return e;
}
int main(int argc, const char *argv[]) {
const float i = 263.3;
printf("# result = %d\n", getExponent(*(uint32_t *)&i));
return 0;
}```

2. Bit-fields or maths.
Code:
```#include <stdio.h>
#include <math.h>
int main ( ) {
union {
float x;
struct {
// assumes bit fields are arranged LSB first
// https://en.wikipedia.org/wiki/Floating-point_arithmetic#Internal_representation
unsigned int mantissa:23;
unsigned int exponent:8;
unsigned int sign:1;
} y;
} v;
v.x = 263.3;
printf("Sign=%u, exponent=%d, mantissa=%u\n",
v.y.sign, (int)v.y.exponent-127, v.y.mantissa);

// A float is just 1.xxx * 2^exponent
// which we can find just by taking log2() of the number.
// Use the identity logx(y) = log(y)/log(x) to calculate the log
// in one base, using the log function of any other base.
int e = (int)floor(log10(v.x)/log10(2));
printf("Calculated=%d\n", e);
}```

3. I don't want to use Math header or Union.
The mask is 0b11111111 which is 8 digits, there must be a way to loop 8-times and extract the bias-exponent.

Code:
```int main(int argc, const char *argv[]) {
const float i = 263.3;
printf("Exponent = %d\n", ((*(__uint32_t *)&i / 8388608) & 0b11111111) - 127);
return 0;
}```

4. Code:
```int main(int argc, const char *argv[]) {
const float i = 263.3;
printf("Exponent = %d\n", ((*(__uint32_t *)&i / 8388608) % 255) - 127);
return 0;
}```
Solved, but I really need to understand how to extract the bias-exponent by looping 8-times.

5. That's no different to what you started with.
printf("Exponent = %d\n", ((binary >> 23) & 255) - 127);

6. Code:
```static const int32_t getExponent(uint32_t binary) {
uint32_t b, e = 0, i = 8, power;
binary /= 8388608;
while (i--) {
b = binary;
power = i;
while (power--) { // loops 28 times - CPU overloading - too expensive
b /= 2;
}
e *= 2;
if (b % 2 == 1) {
++e;
}
}
return e - 127;
}
int main(int argc, const char *argv[]) {
const float i = 263.3;
printf("%d\n", getExponent(*(int32_t *)&i));
return 0;
}```
The original implementation was faulty, because the extracted remainder in the loop was from the right-hand side of the binary.

The corrected one is expensive and complicated, I dislike it.

7. > while (power--) // loops 28 times - CPU overloading - too expensive
So initialise b once, and do one b /= 2 for each loop iteration.

I've no idea what you're trying to prove here, except how to waste time in the most efficient way possible.

All you're doing is copying one bit at a time from b to e when you could do the whole thing as one 8 bit block using a mask.

8. The answer is to use the frexpf() function to decompose a float into exponent and mantissa. Internally it will be doing bit shifting and masking and you can write your own version on similar principles if the goal is to do it without using a library function.

9. You can extract a bit field from an unsigned value using multiply (to force the upper bits off the end, clearing the top part, followed by divide (to force the lower bits off the end, clearing the bottom part). This leaves the bits in the 2**0 position. You can multiply once more if you want them back in their original position:

exponent = bits * (1 << 1);
exponent = exponent / (1<<23);

// optional:
exponent = exponent * (1 << 23);

With the optional part, this should effectively "mask" the other bits out of the number. Without the optional part, this masks the bits out and shifts them down into the 0..n range.

10. There is a "portable" way. Using frexpf():

Code:
```float m;
int e;

m = frexpf( value, &e );
printf( "Value=%f -> m=%f, e=%d\n", value, m, e );```
frexpf() returns values between 0<=m<1, and e is the actual expoent plus 1. So, if you want the actual expoent and mantissa for normalized values:
Code:
```m = frexpf( f, &e );
m *= 2.0;
--e;```
But, of couse, be careful with zero, NAN and INFINITY.
Here's a simple test with floats:
Code:
```#include <stdio.h>
#include <math.h>

void showfloat ( float f )
{
float m;
int e;

printf( "value=%g ", f );

// Zero, NAN and INFINITY are special (known expoent).
if ( f == 0.0f || ! isfinite ( f ) )
{
printf ( "- Special case.\n" );
return;
}

m = frexpf ( f, &e );
printf ( " (mantissa=%.28g, expoent=%d).\n",
m * 2.0, --e );
}

int main ( void )
{
int n;
float minimum, maximum;

n = 1;
minimum = * ( float * ) &n; // minimum subnormal possible.

n = 0x7f7fffff;
maximum = * ( float * ) &n; // maximum positive normalized possible.

float values[] = { 0.0f,
1.0f,
-1.0f,
minimum,
maximum,
NAN,
INFINITY };

for ( int i = 0;
i < sizeof values / sizeof values[0];
i++ )
showfloat ( values[i] );
}```
Code:
```\$ cc -O2 -o test test.c -lm
\$ ./test
value=0 - Special case.
value=1  (mantissa=1, expoent=0).
value=-1  (mantissa=-1, expoent=0).
value=1.4013e-45  (mantissa=1, expoent=-149).
value=3.40282e+38  (mantissa=1.99999988079071044921875, expoent=127).
value=nan - Special case.
value=inf - Special case.```

11. The problem to solve was to avoid using Math header, library function and the shift-mask operators.
In that case the only option was to use the division and the remainder % operators.

12. Then the only thing left is to multiply or divide by 2 until you get a number 0.5 < |x| <= 1.0, starting with the bias as the count value and adding or subtracting 1 from that count on each iteration.

It's what you'd do in older programming languages, or higher-level languages that don't know or care about bits.

I think the important point here is to understand that "floating point" is essentially just "scientific notation" that you learned in high school math, but usually in base 2 instead of base ten. (Almost always, but the IBM mainframes I started on used base sixteen...and versions of that hardware are still running legacy code last I looked.) In scientific notation, you were probably taught that the exponent on 10 was the number of times you had to move the decimal point to get it just after the leading nonzero, and negative if the movement was to the left. Binary floating point is the same, except that powers of 2 are used and it's a "binary point" that's being moved. Moving the binary point right is the same as multiplication by 2 and moving it left is the same as division by 2. That's all there is to it.

13. But.... Why?

I mean, it is possible, and I can do it, but why do you want to?

And why the restriction on what elementary operators you can use?

14. Seems like one of those pointless exercises like add two numbers without using +

It seems the OP is lumbered with a tutor who thinks magic tricks is how you teach programming.

15. Originally Posted by ZTik
The problem to solve was to avoid using Math header, library function and the shift-mask operators.
In that case the only option was to use the division and the remainder % operators.
So if the absolute value is greater than two you repeatedly divide by two, otherwise you repeatedly multiply by two, until you have a number in the form 1.xxxxxx. The number of steps gives you the exponent.

Popular pages Recent additions