# IEEE floating point armithmetic

This is a discussion on IEEE floating point armithmetic within the C Programming forums, part of the General Programming Boards category; Alright, so I'm trying to make an program that does calculates the sum of two numbers through two methods. One ...

1. ## IEEE floating point armithmetic

Alright, so I'm trying to make an program that does calculates the sum of two numbers through two methods. One is basic integer arithmetic (simple), and the other is through IEEE floating point arithmetic (not so simple). I'm using basic bit manipulation C commands (shift, bit macros, ect. . .) The int main() code was prewritten for me; I was just tasked with writing the functions. I know the first three function properly because I've tested them with a working compute function. Unfortunately, when I plug in my compute function it compiles, but doesn't work the way it's supposed to at all. Here's my code (the functions are declared in the header, but everything else is in the .c file):

Code:
```#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "floatcomp.h"
#define BIT31 ((uint) 0x80000000)
#define BIT30 ((uint) 0x7f800000)
#define BIT22 ((uint) 0x007fffff)
#define BIT0 ((uint) 0x00000000)
#define BIT1 ((uint) 0x00800000)
#define BITALL ((uint) 0xffffffff)

int showmode = 0;

int main(int argc, char *argv[])
{
float f1, f2, r1, r2;
char op;
int stop = 0, nitem;

if ((argc == 2) && (argv[1][0]=='v'))
showmode = 1;

while ((nitem = scanf("%f %c %f", &f1, &op, &f2)) == 3) {
switch (op) {
case '+': r1 = f1 + f2; break;
case '-': r1 = f1 - f2; break;
case '*': r1 = f1 * f2; break;
case '/': r1 = f1 / f2; break;
default:
stop = 1;
break;
}

if (stop) break;

if (showmode) {
printf("\n");
printf("---- standard computation ---------\n\n");
}
printf("%13.6f %c %13.6f = %13.6f\n", f1, op, f2, r1);

uint xf1 = *(uint *)&f1;
uint xf2 = *(uint *)&f2;

if (showmode) {
printf("\n");
printf("---- integer based computation ----\n\n");
printf("  f1: 0x%08x\n", xf1);
printf("  f2: 0x%08x\n", xf2);
printf("\n");
}

r2 = compute(xf1, xf2, op);

if (showmode) {
printf("\n");
}

printf("%13.6f %c %13.6f = %13.6f\n\n", f1, op, f2, r2);
}

if (nitem != EOF)
printf("input expression format error\n");
}

uint unpack(uint n, int *s, int *e, int *f) // extracts the sign bit, exponent string, and the mantissa from the //number
{
*s= (BIT31 & n) >> 31;
*e= (BIT30 & n) >> 23;
*f= (BIT22 & n);

if(*e!=0)
*f = *f | BIT1;

*e = *e - 127;
}

uint  pack(int s, int e, int f) // opposite of unpack, puts the disassembled binary back together
{
s = s << 31;
e = (e + 127) << 23;
f = (f & BIT22);

int n = s | e | f;

return (n);
}

int   first_nzb(int f) //finds left-most non-zero bit and returns location in the bit
{
int i;
for(i=31;i>=0;i--)
{
if((1<<i) & f)
return (i);
if(i==0)
return(-1);
}
}

void  normalize(int k, int *e, int *f) //puts the fraction where it's supposed to be and determines if the exponent //needs to be accordingly increased or decreased.
{
int shift = k - 23;
if (shift > 0)
*f = *f >> shift;
else
*f = *f << (-shift);

*e = *e + shift;
}

float compute(uint n1, uint n2, char op)
{
int shift;
int s1, e1, f1, s2, e2, f2;
int k, n3, f3, s3, e3;
unpack(n1, &s1, &e1, &f1); //extract appropriate parts from the bit numbers
unpack(n2, &s2, &e2, &f2);
if (op == '-')
{
if(n2<0) //if it's a negative number, take two's compliment by toggling numbers and adding one
n2 = ~n2 + 1;
if(f1>f2) //if the number being subtracted from is greater, sign is positive
{
s3=s1;
}
if(f1<f2) //if the number being subtracted is larger, sign is negative
{
s3=s2;
}
}

if(e1 > e2)
{
//if exponent of n1 is larger, use it's exponent and shift fraction 2 the difference between the //exponents
e3 = e1;
f2 = f2 << (e1 - e2);
}
else if(e2 > e1)//if exponent of n1 is larger, use it's exponent
{
//same as last function, just opposite
e3 = e2;
f1 = f1 << (e2 - e1);
}
else if(e1 == e2)
{
if((f1+f2) > 0)
{
e3 = e2 + 1; // if the exponents are equal, but the fraction is greater than 0, shift 1
shift = 1;
}
else
e3 = e2;
}

if(op == '+')
{
s3=s1; //if adding, just pick first sign, because they're both the same
}
f3 = f1 + f2; // add fractions, then call predefined functions to prepare for addition
if(shift==1) f3 = f3 >> 1;
normalize(first_nzb(f3),&e3, &f3);
f3 = f3 & BIT22;
uint x = pack(s3,e3,f3);
float *fp;
fp = &x;
return *fp;
}```
I'll admit, I'm a little bit in over my head with this. I usually program in C++, and this is my first experience with bit manipulation commands. I'm sure I'm making some blatant logic error, but I just don't understand the syntax well enough to see what it is. Anyways, the numbers I get are drastically off. Well, the second one at least, obviously the addition done by integer arithmetic (the first number it returns) is correct. This is a lab assignment, and the TA said we could ask outside sources for advice, but, obviously, don't just give me the entire code, or it would defeat the purpose.

Thanks,
Aaron

2. Warnings I got trying to get your code to compile.

Code:
```H:\SourceCode\Projects\testieee\main.c||In function 'compute':|
H:\SourceCode\Projects\testieee\main.c|166|warning: assignment from incompatible pointer type [enabled by default]|
H:\SourceCode\Projects\testieee\main.c|116|warning: unused variable 'n3' [-Wunused-variable]|
H:\SourceCode\Projects\testieee\main.c|116|warning: unused variable 'k' [-Wunused-variable]|
H:\SourceCode\Projects\testieee\main.c||In function 'first_nzb':|
H:\SourceCode\Projects\testieee\main.c|99|warning: control reaches end of non-void function [-Wreturn-type]|
H:\SourceCode\Projects\testieee\main.c||In function 'unpack':|
H:\SourceCode\Projects\testieee\main.c|76|warning: control reaches end of non-void function [-Wreturn-type]|
H:\SourceCode\Projects\testieee\main.c||In function 'main':|
H:\SourceCode\Projects\testieee\main.c|64|warning: control reaches end of non-void function [-Wreturn-type]|
||=== Build finished: 0 errors, 6 warnings (0 minutes, 0 seconds) ===|```
Note: From http://en.wikipedia.org/wiki/IEEE_75...ion_of_numbers

Tim S.

3. Oh, forgot to say that I'm compiling in the linux terminal. I only get one warning, and I've been told by my TA it's a necessary one. It's about an incompatible pointer type when the *fp is assigned.

4. If you're only getting one warning, you need to compile with these flags: "-Wall -pedantic".

Also, you could replace
Code:
```#define BITALL ((uint) 0xffffffff)

/* to */

#define BITALL (0xffffffffu)```
and then make the same change to the rest of them

5. I'm new to linux and I'm unsure as to how to compile with flags.

Also, I changed the bit declarations, no change.

6. Result seems to be off compiling under MinGW GCC.
Tim S.

Code:
```      r2 = compute(xf1, xf2, op);

if (showmode) {
uint xr1 = *(uint *)&r1;
uint xr2 = *(uint *)&r2;
printf("  r1: 0x%08x\n", xr1);
printf("  r2: 0x%08x\n", xr2);
printf("\n");
}```
Code:
```1.0 + 3.0

---- standard computation ---------

1.000000 +      3.000000 =      4.000000

---- integer based computation ----

f1: 0x3f800000
f2: 0x40400000

r1: 0x40800000
r2: 0x40e00000

1.000000 +      3.000000 =      7.000000

3.0 - 1.0

---- standard computation ---------

3.000000 -      1.000000 =      2.000000

---- integer based computation ----

f1: 0x40400000
f2: 0x3f800000

r1: 0x40000000
r2: 0x40e00000

3.000000 -      1.000000 =      7.000000```

7. I strongly suggest testing the good compute function once more.
Your current code does a real bad job on "0.0 + 0.0", "1.0 + 3.0", or "3.0 - 1.0" to confirm your pack and unpack are correct.

Tim S.

8. Checked it with the solid compute function again and all of those additions work perfectly. The error definitely isn't in the pack and unpack functions.

Originally Posted by stahta01
Note: From http://en.wikipedia.org/wiki/IEEE_75...ion_of_numbers

Tim S.

I've tried both stripping away the 1 bit as well as making sure it's set to 1 and neither changes the output.

9. Note that this
Code:
`*s = (BIT31 & n) >> 31;`
may result in a value of -1 (all ones) or 1, depending on your system. How about
Code:
`*s = (BIT31 & n) != 0;`
This will never be true since n2 is an unsigned int:
Code:
`if(n2<0)`
Comparing f1 and f2 the way you do to determine the sign is not going to work since you're not considering the exponents.

It makes more sense to call first_nzb from within normalize instead of in its caller parameters.

Anyway, those are a couple of points I can see, but I haven't looked in detail at your logic.

10. Focus on one calculation at a time, step through things with the debugger and see where it goes wrong.

Perhaps start with 0.0 + 0.0 as that should be easy to debug.

Do you need this to work correctly for denormals, infinities, or NaNs?

11. Yeah, I need to make allowances for those cases, but I'm just focusing on getting it to function basically first. I made some tweaks that give me the proper numbers, unfortunately, it only works for a few calculations in a row, then it starts doing wonky things to the sum, like divide by 2. . .

12. Since you've given no indication of noticing, please note that I've pointed out a couple of errors in your code in my post above.

13. If you want more help then you need to show updated code. At the moment I'm assuming you've solved the problem on your own and not told us.