# Thread: floating points

1. ## floating points

I'm trying to write two functions.
One called: void displayFloat(float f);
And the other: float makeFloat(char* f);

I want displayFloat to simply take in a float and display the 32 bits that represent the passed float. For example, if -5.8125 was passed, this would get printed out:
1 10000001 01110100000000000000000

I'd like makeFloat to do the opposite. For example, pass "-101.1101" to the function and it should return -5.8125

I understand I need a to do bit manipulation with unsigned ints since they have 32 bits, same as floats and I need a union like so:
Code:
typedef union fi {
unsigned int i;
float f;
}fi;
If anyone can point me in the right direction with this bit manipulation stuff, that'd be great, in the mean time I'll be reading up on more bit manipulation tutorial material. Thank you!

2. Originally Posted by johngoodman
If anyone can point me in the right direction with this bit manipulation stuff, that'd be great, in the mean time I'll be reading up on more bit manipulation tutorial material. Thank you!
Code:
unsigned char bits[sizeof(float)];
float x = 3.14f;

memcpy(bit, &x, sizeof(float));

/* now you can use & and | and printf() to examine the bits array */
sizeof(float) is virtually always 4 and the encoding is virtually always ieee, but
this isn't guaranteed on every C platform.

3. Originally Posted by johngoodman
if -5.8125 was passed, this would get printed out:
1 10000001 01110100000000000000000

I'd like makeFloat to do the opposite. For example, pass "-101.1101" to the function and it should return -5.8125
That's very much not the opposite though. The opposite would be to accept 1 10000001 01110100000000000000000 and return -5.8125.
You're instead expecting to pass in a fixed-point representation of the number, and include a minus sign for negatives. That's very different.

To make sure these are really opposites of one another, I would write a function which took a float and returned a char* (probably just point to a static buffer and not care about multithreading at this stage), then call that from within displayFloat. This way you can confirm that the output of one, if fed into the other, gives you back what you started with. I did this for my own int to Roman numeral converter and it worked great!

4. This is what I'm running now, but it outputs 1 01111111 11111111111111111111111 which is obviously not -5.8125

Code:

typedef union fi {
unsigned int i;
float f;
}fi;

void displayFloat(float f) {
fi myUnion;
myUnion.f = f;
int theBitValue;
for(int i = 1; i <= 32; i++){
theBitValue = bitValue(myUnion.f,i);
printf("%d", theBitValue);
if(i == 1){
printf(" ");
}
if(i == 9){
printf(" ");
}
}

}

int bitValue(unsigned int num, int index){
unsigned int mask = 1 << index;
return num >> index;
}

5. Never mind, i fixed it by doing a simple bit shift, haha. But I'm still working on makefloat

6. So for the makeFloat function I'd like to call it like so: float result = makeFloat("-101.1101"); .... so I pass it that value which is binary, but I need to convert it to an actually float and return it. The value that is passed is -5.8125 I believe. I just don't know how to write an algorithm for this

7. Originally Posted by johngoodman
So for the makeFloat function I'd like to call it like so: float result = makeFloat("-101.1101"); .... so I pass it that value which is binary, but I need to convert it to an actually float and return it. The value that is passed is -5.8125 I believe. I just don't know how to write an algorithm for this
Sounds like you want to convert binary to decimal. Remember that the mathematical value of a binary integer number is

b_n * 2^n + b_n-1 * 2^(n-1) + ... + b_0 * 2^0

where b_i is bit i

For the fractional part, you can extend this pattern using negative powers of the base. For example, .101 is equal to

1 * 2^(-1) + 0 * 2^(-2) + 1 * 2^(-3)

Basically, you need to look at each individual digit and its associated power of the representation base (2 for binary), and add the "place-value" of the digit to an accumulator sum initialized to 0.

BTW, its a bad idea to work with the internal representation (individual bits) of a "float," unless you are writing low level assembly library code that is allowed to make assumptions about the internal format the architecture uses for floats, like IEEE 754, etc. I'd suggest to stick with simple "float" values and use the mathematical definition to covert.

8. Originally Posted by MacNilly
Sounds like you want to convert binary to decimal. Remember that the mathematical value of a binary integer number is

b_n * 2^n + b_n-1 * 2^(n-1) + ... + b_0 * 2^0

where b_i is bit i

For the fractional part, you can extend this pattern using negative powers of the base. For example, .101 is equal to

1 * 2^(-1) + 0 * 2^(-2) + 1 * 2^(-3)

Basically, you need to look at each individual digit and its associated power of the representation base (2 for binary), and add the "place-value" of the digit to an accumulator sum initialized to 0.

BTW, its a bad idea to work with the internal representation (individual bits) of a "float," unless you are writing low level assembly library code that is allowed to make assumptions about the internal format the architecture uses for floats, like IEEE 754, etc. I'd suggest to stick with simple "float" values and use the mathematical definition to covert.
But how do you traverse the list of bits and find the value it represents? Bit shifting?

9. You don't really need bit shifting[*]. You've said that you pass the binary value as a string ("-101.1101"). So just iterate through the characters and whenever it is '1' you have to add the corresponding "place-value". Of course you have also take care about the decimal point.

[*] Instead of calculating the "place value" using bit shifting you could also multiply/divide each iteration by 2.

Bye, Andreas

10. I'd like to help further, but I'm still unclear as to whether you intend the string -> float operation to be the reverse of the float -> string operation OR, whether you don't as your example shows.
If you don't, then I'm uncertain as to whether you realise the limitations of using a totally different representation, and whether you are really ultimately making the correct decision.

Can you please clear up this contradiction between your specified goal and the example given? Which is correct? Are you aware that the fixed-point representation in your example will actually be a harder to convert back to a float than if you truly did the exact reverse operation?
What is this really being used for?

Popular pages Recent additions