IEEE floating point armithmetic

This is a discussion on IEEE floating point armithmetic within the C Programming forums, part of the General Programming Boards category; Alright, so I'm trying to make an program that does calculates the sum of two numbers through two methods. One ...

  1. #1
    Registered User
    Join Date
    Mar 2012
    Posts
    5

    IEEE floating point armithmetic

    Alright, so I'm trying to make an program that does calculates the sum of two numbers through two methods. One is basic integer arithmetic (simple), and the other is through IEEE floating point arithmetic (not so simple). I'm using basic bit manipulation C commands (shift, bit macros, ect. . .) The int main() code was prewritten for me; I was just tasked with writing the functions. I know the first three function properly because I've tested them with a working compute function. Unfortunately, when I plug in my compute function it compiles, but doesn't work the way it's supposed to at all. Here's my code (the functions are declared in the header, but everything else is in the .c file):

    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    #include "floatcomp.h"
    #define BIT31 ((uint) 0x80000000)
    #define BIT30 ((uint) 0x7f800000)
    #define BIT22 ((uint) 0x007fffff)
    #define BIT0 ((uint) 0x00000000)
    #define BIT1 ((uint) 0x00800000)
    #define BITALL ((uint) 0xffffffff)
    
    int showmode = 0;
    
    int main(int argc, char *argv[])
    {
        float f1, f2, r1, r2;
        char op;
        int stop = 0, nitem;
    
        if ((argc == 2) && (argv[1][0]=='v'))
          showmode = 1;
    
        while ((nitem = scanf("%f %c %f", &f1, &op, &f2)) == 3) {
          switch (op) {
          case '+': r1 = f1 + f2; break;
          case '-': r1 = f1 - f2; break;
          case '*': r1 = f1 * f2; break;
          case '/': r1 = f1 / f2; break;
          default: 
            stop = 1;
            break;
          }
      
          if (stop) break;
    
          if (showmode) {
            printf("\n");
            printf("---- standard computation ---------\n\n");
          }
          printf("%13.6f %c %13.6f = %13.6f\n", f1, op, f2, r1);
    
          uint xf1 = *(uint *)&f1;
          uint xf2 = *(uint *)&f2;
    
          if (showmode) {
            printf("\n");
            printf("---- integer based computation ----\n\n");
            printf("  f1: 0x%08x\n", xf1);
            printf("  f2: 0x%08x\n", xf2);
            printf("\n");
          }
    
          r2 = compute(xf1, xf2, op);
    
          if (showmode) {
            printf("\n");
          }
    
          printf("%13.6f %c %13.6f = %13.6f\n\n", f1, op, f2, r2);
        }
    
        if (nitem != EOF)
        printf("input expression format error\n");
    }
    
    uint unpack(uint n, int *s, int *e, int *f) // extracts the sign bit, exponent string, and the mantissa from the //number
    {
        *s= (BIT31 & n) >> 31;
        *e= (BIT30 & n) >> 23;
        *f= (BIT22 & n);
    
            if(*e!=0)
        *f = *f | BIT1;     
    
        *e = *e - 127;
    }
    
    uint  pack(int s, int e, int f) // opposite of unpack, puts the disassembled binary back together
    {
        s = s << 31;
        e = (e + 127) << 23;
        f = (f & BIT22);
    
        int n = s | e | f;
    
        return (n);    
    }
    
    int   first_nzb(int f) //finds left-most non-zero bit and returns location in the bit
    {
        int i;
        for(i=31;i>=0;i--)
        {
            if((1<<i) & f)
                return (i);
            if(i==0)
                return(-1);
        }
    }
    
    void  normalize(int k, int *e, int *f) //puts the fraction where it's supposed to be and determines if the exponent //needs to be accordingly increased or decreased.
    {
        int shift = k - 23;
        if (shift > 0)
            *f = *f >> shift;
        else
            *f = *f << (-shift);
        
        *e = *e + shift;
    }
    
    float compute(uint n1, uint n2, char op)
    {
        int shift;
        int s1, e1, f1, s2, e2, f2; 
        int k, n3, f3, s3, e3;
        unpack(n1, &s1, &e1, &f1); //extract appropriate parts from the bit numbers
        unpack(n2, &s2, &e2, &f2);
        if (op == '-')
        {
            if(n2<0) //if it's a negative number, take two's compliment by toggling numbers and adding one
            n2 = ~n2 + 1;
            if(f1>f2) //if the number being subtracted from is greater, sign is positive
            {
                s3=s1;
            }
            if(f1<f2) //if the number being subtracted is larger, sign is negative
            {
                s3=s2;
            }
        }
    
        if(e1 > e2)
        { 
                    //if exponent of n1 is larger, use it's exponent and shift fraction 2 the difference between the //exponents
            e3 = e1;
            f2 = f2 << (e1 - e2); 
        }
        else if(e2 > e1)//if exponent of n1 is larger, use it's exponent
        { 
                    //same as last function, just opposite
            e3 = e2;
            f1 = f1 << (e2 - e1); 
        }
        else if(e1 == e2)
        {
            if((f1+f2) > 0)
            {
                e3 = e2 + 1; // if the exponents are equal, but the fraction is greater than 0, shift 1
                shift = 1;
            }
            else
            e3 = e2;
        }
    
        if(op == '+')
        {
            s3=s1; //if adding, just pick first sign, because they're both the same
        }
        f3 = f1 + f2; // add fractions, then call predefined functions to prepare for addition
        if(shift==1) f3 = f3 >> 1;
        normalize(first_nzb(f3),&e3, &f3);
        f3 = f3 & BIT22;
        uint x = pack(s3,e3,f3);
        float *fp;
        fp = &x;
        return *fp;
    }
    I'll admit, I'm a little bit in over my head with this. I usually program in C++, and this is my first experience with bit manipulation commands. I'm sure I'm making some blatant logic error, but I just don't understand the syntax well enough to see what it is. Anyways, the numbers I get are drastically off. Well, the second one at least, obviously the addition done by integer arithmetic (the first number it returns) is correct. This is a lab assignment, and the TA said we could ask outside sources for advice, but, obviously, don't just give me the entire code, or it would defeat the purpose.

    Thanks,
    Aaron

  2. #2
    Registered User
    Join Date
    May 2009
    Posts
    2,581
    Warnings I got trying to get your code to compile.

    Code:
    H:\SourceCode\Projects\testieee\main.c||In function 'compute':|
    H:\SourceCode\Projects\testieee\main.c|166|warning: assignment from incompatible pointer type [enabled by default]|
    H:\SourceCode\Projects\testieee\main.c|116|warning: unused variable 'n3' [-Wunused-variable]|
    H:\SourceCode\Projects\testieee\main.c|116|warning: unused variable 'k' [-Wunused-variable]|
    H:\SourceCode\Projects\testieee\main.c||In function 'first_nzb':|
    H:\SourceCode\Projects\testieee\main.c|99|warning: control reaches end of non-void function [-Wreturn-type]|
    H:\SourceCode\Projects\testieee\main.c||In function 'unpack':|
    H:\SourceCode\Projects\testieee\main.c|76|warning: control reaches end of non-void function [-Wreturn-type]|
    H:\SourceCode\Projects\testieee\main.c||In function 'main':|
    H:\SourceCode\Projects\testieee\main.c|64|warning: control reaches end of non-void function [-Wreturn-type]|
    ||=== Build finished: 0 errors, 6 warnings (0 minutes, 0 seconds) ===|
    Note: From http://en.wikipedia.org/wiki/IEEE_75...ion_of_numbers
    The leading 1 bit is omitted because it contains no information. Since all numbers except zero start with a leading 1, the leading 1 is left implicit.
    Do you have to add back this leading 1 bit in your program?

    Tim S.
    Last edited by stahta01; 03-06-2012 at 06:50 PM.
    "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to produce bigger and better idiots. So far, the Universe is winning." Rick Cook

  3. #3
    Registered User
    Join Date
    Mar 2012
    Posts
    5
    Oh, forgot to say that I'm compiling in the linux terminal. I only get one warning, and I've been told by my TA it's a necessary one. It's about an incompatible pointer type when the *fp is assigned.

  4. #4
    Registered User
    Join Date
    Dec 2011
    Posts
    795
    If you're only getting one warning, you need to compile with these flags: "-Wall -pedantic".

    Also, you could replace
    Code:
    #define BITALL ((uint) 0xffffffff)
    
    /* to */
    
    #define BITALL (0xffffffffu)
    and then make the same change to the rest of them

  5. #5
    Registered User
    Join Date
    Mar 2012
    Posts
    5
    I'm new to linux and I'm unsure as to how to compile with flags.

    Also, I changed the bit declarations, no change.

  6. #6
    Registered User
    Join Date
    May 2009
    Posts
    2,581
    Result seems to be off compiling under MinGW GCC.
    Tim S.

    Note: Added lines to this block to see more info.
    Code:
          r2 = compute(xf1, xf2, op);
    
          if (showmode) {
            uint xr1 = *(uint *)&r1;
            uint xr2 = *(uint *)&r2;
            printf("  r1: 0x%08x\n", xr1);
            printf("  r2: 0x%08x\n", xr2);
            printf("\n");
          }
    Code:
    1.0 + 3.0
    
    ---- standard computation ---------
    
         1.000000 +      3.000000 =      4.000000
    
    ---- integer based computation ----
    
      f1: 0x3f800000
      f2: 0x40400000
    
      r1: 0x40800000
      r2: 0x40e00000
    
         1.000000 +      3.000000 =      7.000000
    
    3.0 - 1.0
    
    ---- standard computation ---------
    
         3.000000 -      1.000000 =      2.000000
    
    ---- integer based computation ----
    
      f1: 0x40400000
      f2: 0x3f800000
    
      r1: 0x40000000
      r2: 0x40e00000
    
         3.000000 -      1.000000 =      7.000000
    "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to produce bigger and better idiots. So far, the Universe is winning." Rick Cook

  7. #7
    Registered User
    Join Date
    May 2009
    Posts
    2,581
    I strongly suggest testing the good compute function once more.
    Your current code does a real bad job on "0.0 + 0.0", "1.0 + 3.0", or "3.0 - 1.0" to confirm your pack and unpack are correct.

    Tim S.
    "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to produce bigger and better idiots. So far, the Universe is winning." Rick Cook

  8. #8
    Registered User
    Join Date
    Mar 2012
    Posts
    5
    Checked it with the solid compute function again and all of those additions work perfectly. The error definitely isn't in the pack and unpack functions.

    Quote Originally Posted by stahta01 View Post
    Note: From http://en.wikipedia.org/wiki/IEEE_75...ion_of_numbers


    Do you have to add back this leading 1 bit in your program?

    Tim S.

    I've tried both stripping away the 1 bit as well as making sure it's set to 1 and neither changes the output.
    Last edited by PuddleOfShane; 03-06-2012 at 07:39 PM.

  9. #9
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Note that this
    Code:
    *s = (BIT31 & n) >> 31;
    may result in a value of -1 (all ones) or 1, depending on your system. How about
    Code:
    *s = (BIT31 & n) != 0;
    This will never be true since n2 is an unsigned int:
    Code:
    if(n2<0)
    Comparing f1 and f2 the way you do to determine the sign is not going to work since you're not considering the exponents.

    It makes more sense to call first_nzb from within normalize instead of in its caller parameters.

    Anyway, those are a couple of points I can see, but I haven't looked in detail at your logic.
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  10. #10
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,304
    Focus on one calculation at a time, step through things with the debugger and see where it goes wrong.

    Perhaps start with 0.0 + 0.0 as that should be easy to debug.

    Do you need this to work correctly for denormals, infinities, or NaNs?
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  11. #11
    Registered User
    Join Date
    Mar 2012
    Posts
    5
    Yeah, I need to make allowances for those cases, but I'm just focusing on getting it to function basically first. I made some tweaks that give me the proper numbers, unfortunately, it only works for a few calculations in a row, then it starts doing wonky things to the sum, like divide by 2. . .

  12. #12
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Since you've given no indication of noticing, please note that I've pointed out a couple of errors in your code in my post above.

  13. #13
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,304
    If you want more help then you need to show updated code. At the moment I'm assuming you've solved the problem on your own and not told us.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Floating point Hex.
    By cstubbs50 in forum C Programming
    Replies: 4
    Last Post: 11-16-2005, 03:29 PM
  2. floating point value
    By twans in forum C++ Programming
    Replies: 9
    Last Post: 04-07-2005, 08:55 AM
  3. Reading 64 bit IEEE floating point from a file
    By rcobb in forum C Programming
    Replies: 1
    Last Post: 04-23-2003, 07:28 PM
  4. fixed point / floating point
    By confuted in forum Game Programming
    Replies: 4
    Last Post: 08-13-2002, 01:25 PM
  5. Floating point faster than fixed-point
    By VirtualAce in forum A Brief History of Cprogramming.com
    Replies: 5
    Last Post: 11-08-2001, 10:34 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21