IEEE floating point armithmetic

**PuddleOfShane** · 03-06-2012

Alright, so I'm trying to make an program that does calculates the sum of two numbers through two methods. One is basic integer arithmetic (simple), and the other is through IEEE floating point arithmetic (not so simple). I'm using basic bit manipulation C commands (shift, bit macros, ect. . .) The int main() code was prewritten for me; I was just tasked with writing the functions. I know the first three function properly because I've tested them with a working compute function. Unfortunately, when I plug in my compute function it compiles, but doesn't work the way it's supposed to at all. Here's my code (the functions are declared in the header, but everything else is in the .c file):

Code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "floatcomp.h"
#define BIT31 ((uint) 0x80000000)
#define BIT30 ((uint) 0x7f800000)
#define BIT22 ((uint) 0x007fffff)
#define BIT0 ((uint) 0x00000000)
#define BIT1 ((uint) 0x00800000)
#define BITALL ((uint) 0xffffffff)

int showmode = 0;

int main(int argc, char *argv[])
{
    float f1, f2, r1, r2;
    char op;
    int stop = 0, nitem;

    if ((argc == 2) && (argv[1][0]=='v'))
      showmode = 1;

    while ((nitem = scanf("%f %c %f", &f1, &op, &f2)) == 3) {
      switch (op) {
      case '+': r1 = f1 + f2; break;
      case '-': r1 = f1 - f2; break;
      case '*': r1 = f1 * f2; break;
      case '/': r1 = f1 / f2; break;
      default: 
        stop = 1;
        break;
      }
  
      if (stop) break;

      if (showmode) {
        printf("\n");
        printf("---- standard computation ---------\n\n");
      }
      printf("%13.6f %c %13.6f = %13.6f\n", f1, op, f2, r1);

      uint xf1 = *(uint *)&f1;
      uint xf2 = *(uint *)&f2;

      if (showmode) {
        printf("\n");
        printf("---- integer based computation ----\n\n");
        printf("  f1: 0x%08x\n", xf1);
        printf("  f2: 0x%08x\n", xf2);
        printf("\n");
      }

      r2 = compute(xf1, xf2, op);

      if (showmode) {
        printf("\n");
      }

      printf("%13.6f %c %13.6f = %13.6f\n\n", f1, op, f2, r2);
    }

    if (nitem != EOF)
    printf("input expression format error\n");
}

uint unpack(uint n, int *s, int *e, int *f) // extracts the sign bit, exponent string, and the mantissa from the //number
{
    *s= (BIT31 & n) >> 31;
    *e= (BIT30 & n) >> 23;
    *f= (BIT22 & n);

        if(*e!=0)
    *f = *f | BIT1;     

    *e = *e - 127;
}

uint  pack(int s, int e, int f) // opposite of unpack, puts the disassembled binary back together
{
    s = s << 31;
    e = (e + 127) << 23;
    f = (f & BIT22);

    int n = s | e | f;

    return (n);    
}

int   first_nzb(int f) //finds left-most non-zero bit and returns location in the bit
{
    int i;
    for(i=31;i>=0;i--)
    {
        if((1<<i) & f)
            return (i);
        if(i==0)
            return(-1);
    }
}

void  normalize(int k, int *e, int *f) //puts the fraction where it's supposed to be and determines if the exponent //needs to be accordingly increased or decreased.
{
    int shift = k - 23;
    if (shift > 0)
        *f = *f >> shift;
    else
        *f = *f << (-shift);
    
    *e = *e + shift;
}

float compute(uint n1, uint n2, char op)
{
    int shift;
    int s1, e1, f1, s2, e2, f2; 
    int k, n3, f3, s3, e3;
    unpack(n1, &s1, &e1, &f1); //extract appropriate parts from the bit numbers
    unpack(n2, &s2, &e2, &f2);
    if (op == '-')
    {
        if(n2<0) //if it's a negative number, take two's compliment by toggling numbers and adding one
        n2 = ~n2 + 1;
        if(f1>f2) //if the number being subtracted from is greater, sign is positive
        {
            s3=s1;
        }
        if(f1<f2) //if the number being subtracted is larger, sign is negative
        {
            s3=s2;
        }
    }

    if(e1 > e2)
    { 
                //if exponent of n1 is larger, use it's exponent and shift fraction 2 the difference between the //exponents
        e3 = e1;
        f2 = f2 << (e1 - e2); 
    }
    else if(e2 > e1)//if exponent of n1 is larger, use it's exponent
    { 
                //same as last function, just opposite
        e3 = e2;
        f1 = f1 << (e2 - e1); 
    }
    else if(e1 == e2)
    {
        if((f1+f2) > 0)
        {
            e3 = e2 + 1; // if the exponents are equal, but the fraction is greater than 0, shift 1
            shift = 1;
        }
        else
        e3 = e2;
    }

    if(op == '+')
    {
        s3=s1; //if adding, just pick first sign, because they're both the same
    }
    f3 = f1 + f2; // add fractions, then call predefined functions to prepare for addition
    if(shift==1) f3 = f3 >> 1;
    normalize(first_nzb(f3),&e3, &f3);
    f3 = f3 & BIT22;
    uint x = pack(s3,e3,f3);
    float *fp;
    fp = &x;
    return *fp;
}

I'll admit, I'm a little bit in over my head with this. I usually program in C++, and this is my first experience with bit manipulation commands. I'm sure I'm making some blatant logic error, but I just don't understand the syntax well enough to see what it is. Anyways, the numbers I get are drastically off. Well, the second one at least, obviously the addition done by integer arithmetic (the first number it returns) is correct. This is a lab assignment, and the TA said we could ask outside sources for advice, but, obviously, don't just give me the entire code, or it would defeat the purpose.

Thanks,
Aaron

**stahta01** · 03-06-2012

Warnings I got trying to get your code to compile.

Code:

H:\SourceCode\Projects\testieee\main.c||In function 'compute':|
H:\SourceCode\Projects\testieee\main.c|166|warning: assignment from incompatible pointer type [enabled by default]|
H:\SourceCode\Projects\testieee\main.c|116|warning: unused variable 'n3' [-Wunused-variable]|
H:\SourceCode\Projects\testieee\main.c|116|warning: unused variable 'k' [-Wunused-variable]|
H:\SourceCode\Projects\testieee\main.c||In function 'first_nzb':|
H:\SourceCode\Projects\testieee\main.c|99|warning: control reaches end of non-void function [-Wreturn-type]|
H:\SourceCode\Projects\testieee\main.c||In function 'unpack':|
H:\SourceCode\Projects\testieee\main.c|76|warning: control reaches end of non-void function [-Wreturn-type]|
H:\SourceCode\Projects\testieee\main.c||In function 'main':|
H:\SourceCode\Projects\testieee\main.c|64|warning: control reaches end of non-void function [-Wreturn-type]|
||=== Build finished: 0 errors, 6 warnings (0 minutes, 0 seconds) ===|

Note: From http://en.wikipedia.org/wiki/IEEE_75...ion_of_numbers

The leading 1 bit is omitted because it contains no information. Since all numbers except zero start with a leading 1, the leading 1 is left implicit.

Do you have to add back this leading 1 bit in your program?

Tim S.

**PuddleOfShane** · 03-06-2012

Oh, forgot to say that I'm compiling in the linux terminal. I only get one warning, and I've been told by my TA it's a necessary one. It's about an incompatible pointer type when the *fp is assigned.

**memcpy** · 03-06-2012

If you're only getting one warning, you need to compile with these flags: "-Wall -pedantic".

Also, you could replace

Code:

#define BITALL ((uint) 0xffffffff)

/* to */

#define BITALL (0xffffffffu)

and then make the same change to the rest of them

**PuddleOfShane** · 03-06-2012

I'm new to linux and I'm unsure as to how to compile with flags.

Also, I changed the bit declarations, no change.

**stahta01** · 03-06-2012

Result seems to be off compiling under MinGW GCC.
Tim S.

Note: Added lines to this block to see more info.

Code:

      r2 = compute(xf1, xf2, op);

      if (showmode) {
        uint xr1 = *(uint *)&r1;
        uint xr2 = *(uint *)&r2;
        printf("  r1: 0x%08x\n", xr1);
        printf("  r2: 0x%08x\n", xr2);
        printf("\n");
      }

Code:

1.0 + 3.0

---- standard computation ---------

     1.000000 +      3.000000 =      4.000000

---- integer based computation ----

  f1: 0x3f800000
  f2: 0x40400000

  r1: 0x40800000
  r2: 0x40e00000

     1.000000 +      3.000000 =      7.000000

3.0 - 1.0

---- standard computation ---------

     3.000000 -      1.000000 =      2.000000

---- integer based computation ----

  f1: 0x40400000
  f2: 0x3f800000

  r1: 0x40000000
  r2: 0x40e00000

     3.000000 -      1.000000 =      7.000000

**stahta01** · 03-06-2012

I strongly suggest testing the good compute function once more.
Your current code does a real bad job on "0.0 + 0.0", "1.0 + 3.0", or "3.0 - 1.0" to confirm your pack and unpack are correct.

Tim S.

**PuddleOfShane** · 03-06-2012

Checked it with the solid compute function again and all of those additions work perfectly. The error definitely isn't in the pack and unpack functions.

Originally Posted by stahta01

Note: From http://en.wikipedia.org/wiki/IEEE_75...ion_of_numbers

Do you have to add back this leading 1 bit in your program?

Tim S.

I've tried both stripping away the 1 bit as well as making sure it's set to 1 and neither changes the output.

**oogabooga** · 03-06-2012

Note that this

Code:

*s = (BIT31 & n) >> 31;

may result in a value of -1 (all ones) or 1, depending on your system. How about

Code:

*s = (BIT31 & n) != 0;

This will never be true since n2 is an unsigned int:

Code:

if(n2<0)

Comparing f1 and f2 the way you do to determine the sign is not going to work since you're not considering the exponents.

It makes more sense to call first_nzb from within normalize instead of in its caller parameters.

Anyway, those are a couple of points I can see, but I haven't looked in detail at your logic.

**iMalc** · 03-07-2012

Focus on one calculation at a time, step through things with the debugger and see where it goes wrong.

Perhaps start with 0.0 + 0.0 as that should be easy to debug.

Do you need this to work correctly for denormals, infinities, or NaNs?

**PuddleOfShane** · 03-07-2012

Yeah, I need to make allowances for those cases, but I'm just focusing on getting it to function basically first. I made some tweaks that give me the proper numbers, unfortunately, it only works for a few calculations in a row, then it starts doing wonky things to the sum, like divide by 2. . .

**oogabooga** · 03-07-2012

Since you've given no indication of noticing, please note that I've pointed out a couple of errors in your code in my post above.

**iMalc** · 03-08-2012

If you want more help then you need to show updated code. At the moment I'm assuming you've solved the problem on your own and not told us.

Thread: IEEE floating point armithmetic

Thread Tools

Search Thread

Display

IEEE floating point armithmetic

Similar Threads

Floating point Hex.

floating point value

Reading 64 bit IEEE floating point from a file

fixed point / floating point

Floating point faster than fixed-point