Thread: floating points

  1. #1
    Registered User
    Join Date
    Feb 2013
    Posts
    100

    floating points

    I'm trying to write two functions.
    One called: void displayFloat(float f);
    And the other: float makeFloat(char* f);

    I want displayFloat to simply take in a float and display the 32 bits that represent the passed float. For example, if -5.8125 was passed, this would get printed out:
    1 10000001 01110100000000000000000

    I'd like makeFloat to do the opposite. For example, pass "-101.1101" to the function and it should return -5.8125

    I understand I need a to do bit manipulation with unsigned ints since they have 32 bits, same as floats and I need a union like so:
    Code:
    typedef union fi {
       unsigned int i;
       float f;
    }fi;
    If anyone can point me in the right direction with this bit manipulation stuff, that'd be great, in the mean time I'll be reading up on more bit manipulation tutorial material. Thank you!

  2. #2
    Registered User
    Join Date
    May 2012
    Posts
    505
    Quote Originally Posted by johngoodman View Post
    If anyone can point me in the right direction with this bit manipulation stuff, that'd be great, in the mean time I'll be reading up on more bit manipulation tutorial material. Thank you!
    Code:
    unsigned char bits[sizeof(float)];
    float x = 3.14f;
    
    memcpy(bit, &x, sizeof(float));
    
    /* now you can use & and | and printf() to examine the bits array */
    sizeof(float) is virtually always 4 and the encoding is virtually always ieee, but
    this isn't guaranteed on every C platform.
    I'm the author of MiniBasic: How to write a script interpreter and Basic Algorithms
    Visit my website for lots of associated C programming resources.
    https://github.com/MalcolmMcLean


  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by johngoodman View Post
    if -5.8125 was passed, this would get printed out:
    1 10000001 01110100000000000000000

    I'd like makeFloat to do the opposite. For example, pass "-101.1101" to the function and it should return -5.8125
    That's very much not the opposite though. The opposite would be to accept 1 10000001 01110100000000000000000 and return -5.8125.
    You're instead expecting to pass in a fixed-point representation of the number, and include a minus sign for negatives. That's very different.

    To make sure these are really opposites of one another, I would write a function which took a float and returned a char* (probably just point to a static buffer and not care about multithreading at this stage), then call that from within displayFloat. This way you can confirm that the output of one, if fed into the other, gives you back what you started with. I did this for my own int to Roman numeral converter and it worked great!
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #4
    Registered User
    Join Date
    Feb 2013
    Posts
    100
    This is what I'm running now, but it outputs 1 01111111 11111111111111111111111 which is obviously not -5.8125

    Code:
     
    typedef union fi {
       unsigned int i;
       float f;
    }fi;
    
    
     void displayFloat(float f) {
      fi myUnion;
      myUnion.f = f;
      int theBitValue;
      for(int i = 1; i <= 32; i++){
       theBitValue = bitValue(myUnion.f,i);
       printf("%d", theBitValue);
       if(i == 1){
        printf(" ");
       }
       if(i == 9){
        printf(" ");
       }
      }
      
     }
     
     
     int bitValue(unsigned int num, int index){
      unsigned int mask = 1 << index;
      num &= mask;
      return num >> index;
     }

  5. #5
    Registered User
    Join Date
    Feb 2013
    Posts
    100
    Never mind, i fixed it by doing a simple bit shift, haha. But I'm still working on makefloat

  6. #6
    Registered User
    Join Date
    Feb 2013
    Posts
    100
    So for the makeFloat function I'd like to call it like so: float result = makeFloat("-101.1101"); .... so I pass it that value which is binary, but I need to convert it to an actually float and return it. The value that is passed is -5.8125 I believe. I just don't know how to write an algorithm for this

  7. #7
    Registered User MacNilly's Avatar
    Join Date
    Oct 2005
    Location
    CA, USA
    Posts
    466
    Quote Originally Posted by johngoodman View Post
    So for the makeFloat function I'd like to call it like so: float result = makeFloat("-101.1101"); .... so I pass it that value which is binary, but I need to convert it to an actually float and return it. The value that is passed is -5.8125 I believe. I just don't know how to write an algorithm for this
    Sounds like you want to convert binary to decimal. Remember that the mathematical value of a binary integer number is

    b_n * 2^n + b_n-1 * 2^(n-1) + ... + b_0 * 2^0

    where b_i is bit i

    For the fractional part, you can extend this pattern using negative powers of the base. For example, .101 is equal to

    1 * 2^(-1) + 0 * 2^(-2) + 1 * 2^(-3)

    Basically, you need to look at each individual digit and its associated power of the representation base (2 for binary), and add the "place-value" of the digit to an accumulator sum initialized to 0.

    BTW, its a bad idea to work with the internal representation (individual bits) of a "float," unless you are writing low level assembly library code that is allowed to make assumptions about the internal format the architecture uses for floats, like IEEE 754, etc. I'd suggest to stick with simple "float" values and use the mathematical definition to covert.
    Last edited by MacNilly; 03-13-2013 at 05:55 PM.

  8. #8
    Registered User
    Join Date
    Feb 2013
    Posts
    100
    Quote Originally Posted by MacNilly View Post
    Sounds like you want to convert binary to decimal. Remember that the mathematical value of a binary integer number is

    b_n * 2^n + b_n-1 * 2^(n-1) + ... + b_0 * 2^0

    where b_i is bit i

    For the fractional part, you can extend this pattern using negative powers of the base. For example, .101 is equal to

    1 * 2^(-1) + 0 * 2^(-2) + 1 * 2^(-3)

    Basically, you need to look at each individual digit and its associated power of the representation base (2 for binary), and add the "place-value" of the digit to an accumulator sum initialized to 0.

    BTW, its a bad idea to work with the internal representation (individual bits) of a "float," unless you are writing low level assembly library code that is allowed to make assumptions about the internal format the architecture uses for floats, like IEEE 754, etc. I'd suggest to stick with simple "float" values and use the mathematical definition to covert.
    But how do you traverse the list of bits and find the value it represents? Bit shifting?

  9. #9
    Registered User
    Join Date
    May 2012
    Posts
    1,066
    You don't really need bit shifting[*]. You've said that you pass the binary value as a string ("-101.1101"). So just iterate through the characters and whenever it is '1' you have to add the corresponding "place-value". Of course you have also take care about the decimal point.

    [*] Instead of calculating the "place value" using bit shifting you could also multiply/divide each iteration by 2.

    Bye, Andreas

  10. #10
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I'd like to help further, but I'm still unclear as to whether you intend the string -> float operation to be the reverse of the float -> string operation OR, whether you don't as your example shows.
    If you don't, then I'm uncertain as to whether you realise the limitations of using a totally different representation, and whether you are really ultimately making the correct decision.

    Can you please clear up this contradiction between your specified goal and the example given? Which is correct? Are you aware that the fixed-point representation in your example will actually be a harder to convert back to a float than if you truly did the exact reverse operation?
    What is this really being used for?
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Bit packing floating points
    By AnishaKaul in forum C Programming
    Replies: 3
    Last Post: 10-01-2010, 01:59 PM
  2. Floating points
    By yann in forum C Programming
    Replies: 13
    Last Post: 09-13-2009, 01:16 PM
  3. Why so much trouble with floating points?
    By darketernal in forum C++ Programming
    Replies: 14
    Last Post: 06-09-2007, 05:16 AM
  4. print floating-points
    By lambs4 in forum C++ Programming
    Replies: 1
    Last Post: 12-01-2002, 09:39 AM
  5. Floating points and Double output
    By bman1176 in forum C++ Programming
    Replies: 2
    Last Post: 10-11-2001, 12:24 AM