Thread: Conversion from float to double

  1. #1
    Registered User Ward's Avatar
    Join Date
    Sep 2001
    Location
    Belgium
    Posts
    39

    Conversion from float to double

    Hi, i have a problem with converting a float to a double.
    I want the double value to be the exact same value as the float value

    Code:
    float fValue(250.840);
    double dValue(0.0);
    dValue = static_cast<double>(fValue);
    The value in dValue is 250.83999633789 instead of 250.8400000000

    Can someone help me out?
    Greetings.

  2. #2
    mustang benny bennyandthejets's Avatar
    Join Date
    Jul 2002
    Posts
    1,401
    It worked fine for me. What compiler, OS, etc?
    [email protected]
    Microsoft Visual Studio .NET 2003 Enterprise Architect
    Windows XP Pro

    Code Tags
    Programming FAQ
    Tutorials

  3. #3
    Registered User
    Join Date
    Sep 2003
    Posts
    135
    What value was actually stored in fValue? Did you output it to take a look? Also, note that you initialise fValue with a double.

  4. #4
    Registered User Ward's Avatar
    Join Date
    Sep 2001
    Location
    Belgium
    Posts
    39
    When I'm debugging using VC++ (using the Watch-screen), I see the excact values of the fValue and the dValue.
    My OS is Win2000.

    I have a feeling the not many people see this phenomenon as a big problem.

    I've worked my way around it, but there should exist a better solution.
    Greetings.

  5. #5
    Registered User Ward's Avatar
    Join Date
    Sep 2001
    Location
    Belgium
    Posts
    39

    output values against watch values

    This is the used code:

    float fValue(250.84f);
    double dValue(0.0);
    dValue = static_cast<double>(fValue);
    cout<<fValue<<endl;
    cout<<dValue<<endl;


    When I put a breakpoint on the first cout statement, the value of fValue is in the debugger 250.840. The value of the dValue is 250.83999633789.

    When continue the program, the output on screen is 250.84 for each value.

    But this is not correct because to my opponion the cout-statement interprets the dValue as a float and rounds the value automatically. In real world I'm not using the values to output them on screen. I work with the binary data (the 4 bytes of the float and the 8 bytes of the double value) and I perform calculations on these double-values. Therefor I must have the exact same values for the double type as for the float type.

    Any solutions?
    Greetings.

  6. #6
    Registered User quagsire's Avatar
    Join Date
    Jun 2002
    Posts
    60
    float occupies 4 bytes in memory and has a precision of 7 digits. double occupies 8 bytes in memory and has a precision of 15 digits. It is impossible to create more precision than what you already have. This means that if you cast a float to a double, the resulting double will still only have a precision of 7 digits.

    250.84 can probably not be represented exactly with 7 digits precision, so the actual value stored may be 250.8399. When upcasting to a double there is no guarantee what will happen to the rest of the digits precision. It may cast it to 250.8399xxxxxxxxxx where x can be anything. If you created the double from scratch it will represent it as accurate as possible using the full 15 digits precision. This is something like 250.840000000000. As you can see there is a difference in the representation.

    I don't know the context that you wish to use the float that was casted to a double, so it is impossible to suggest a solution.

  7. #7
    Registered User
    Join Date
    Sep 2003
    Posts
    135
    When you convert from float to double it should result in the same value in the double as you had in the float. (Converting from double to float, even if the value is in range, doesn't have the same guarantee).

    If you output the float and double to the screen displaying greater precision than the default, they do both in fact show more digits of precision, and they're the same. If you test for equality between the two, they should show equality. (Note that comparing floating point values directly is not something you generally want to do).

    I suspect that what you're actually seeing is something to do with a difference in how VC++ displays floats and doubles in the watch window - everything other than that points to them both holding the same value.

  8. #8
    Registered User Ward's Avatar
    Join Date
    Sep 2001
    Location
    Belgium
    Posts
    39
    To tell you briefly why I need to convert a float to a double:

    I'm working on a Windows CE platform. I have a file containing numerical values of 4 bytes (unsigned long, int, long, float...). This means that when I read a numerical value as a floating point value, it must be read in a float-variable. I have another file that stores only double-values (8 bytes per numeric valule).
    I can only do calculations with values from the double-file.
    When I want to perform calculations on the float-values, I have to store them first in the file with double-values. This is where the conversion from float to double is important.
    Let's say I calculate the sum of two float-values both the same value (250.840). the anwser is easy: 501.680.
    When I convert these values first to double-values, the answer in not the same anymore (501.67999267578).
    Ofcourse when I round the value I can get the exact answer, but this is something I don't really want to do, because I don't know the precision that must be used to round the value.

    For those that have already answered me, thanks for the effort.

    I hope someone will have the perfect solution I need to solve this problem.
    Greetings.

  9. #9
    jasondoucette.com JasonD's Avatar
    Join Date
    Mar 2003
    Posts
    278
    The problems you are having have already been answered in this thread. To sum it all up quickly:

    float only has 7-8 decimal digits of precision. If you take a value stored in a float, and then store it in a double (which has 15-16 decimal digits of precision), you still only have a value that it precise to 7-8 decimal digits. Yes, double can store 15-16 digits, but the number in it stored is only precise to 7-8 digits. For example, take the value 1/3 in decimal in an 8 digit limit:
    0.33333333
    Now, store it in a 16-digit limit format:
    0.3333333300000000

    Therefore, when you compare the final answer, you cannot compare exactly - only the first 6 or 7 digits will probably be accurate, despite the fact that you can store 15 or 16 digits.

    Please note that you cannot even do this when dealing solely with double variables, since floating point numbers are not stored exactly. Try storing 1/3 in decimal 8-digit limit, again, and you'll see what I mean:
    0.33333333
    When you reach the storage limit, and cannot store any more 3's, and if you were to multiply this number by 3, instead of 1.00000000
    you get:
    0.99999999.
    If you were to compare this to 1.00000000, they would not match exactly. You need a threshold, normally called 'epsilon', set to some reasonable amount, say 0.0000001. Compare the two numbers by subtracting one from the other, take the absolute value, and if this result is less than epsilon, you can determine that the two numbers are close enough that they are identical. The value of epsilon is up to you - just make sure it works in the worse case scenario.

    I hope this helps.

  10. #10
    Registered User Ward's Avatar
    Join Date
    Sep 2001
    Location
    Belgium
    Posts
    39
    According to me JasonD is disgussing the comparisation of the representation of real numbers. I don't want to take this 'thread' to that problem, sorry (thanks for the response anyway).
    I'm very certain that my float-values have a precision of 4 digits at most.

    Does anyone else have a solution?
    Greetings.

  11. #11
    Registered User
    Join Date
    Sep 2003
    Posts
    135
    A solution to what? I'm fairly sure the two values are the same, it's just the way VC++ displays floats and doubles in the watch window. Stop looking at the values in the watch window and instead try:

    [1] Outputting both the float and double with greater precision to the screen and see if they're the same (they were when I tried it)

    [2] Compare the float and double using the equality operator ==.

    As I suggested earlier, testing equality of floating point values is generally not a good idea, instead you should check whether they differ by less than some very small value. But in this case, where a float is assigned (or cast) to a double, testing for equality should be fine.

    Note that you don't need a cast to go from float to double in C or C++. You can assign directly.

  12. #12
    Registered User Ward's Avatar
    Join Date
    Sep 2001
    Location
    Belgium
    Posts
    39

    Lightbulb Solution

    To omnius:
    you send the output on screen using cout. try using printf.

    I found the solution, I've called it '_ftod':
    Code:
    #include <stdlib.h>
    #include <stdio.h>
    #include <conio.h>
    
    double _ftod(float fValue)
    {
    	char czDummy[30];
    	sprintf(czDummy,"%9.5f",fValue);
    	double dValue = strtod(czDummy,NULL);
    	return dValue;
    }
    
    int main(int argc, char* argv[])
    {
    	float fValue(250.84f);
    	double dValue = _ftod(fValue);//good conversion
    	double dValue2 = fValue;//wrong conversion
    	printf("%f\n",dValue);//250.840000
    	printf("%f\n",dValue2);//250.839996
    	getch();
    	return 0;
    }
    JasonD said something about precision. Et voilą.

    Thanks you all for sparing the moment to look at my problem:
    bennyandthejets, Omnius, quagsire and JasonD
    Greetings.

  13. #13
    Registered User
    Join Date
    Sep 2003
    Posts
    135
    It still appears to me that you're chasing a phantom problem, and the more you chase the more you convince yourself that you've got a solution to this phantom problem.

    Your _ftod does NOT convert a float to a double. It converts a string to a double. The string is created using the original float with reduced precision, which in this case matches the output seen using the watch window in VC++.

    Here's what you said at the start of the thread:

    Hi, i have a problem with converting a float to a double.
    I want the double value to be the exact same value as the float value


    You haven't achieved that with your solution, you've just fooled yourself into thinking you've achieved it because for the single value that you're testing it with it matches what you (erroneously) expect to see.

    There is no cast required to go from float to double, you can simply assign the float to the double and it will have the same value. You're using C; in C this much is guaranteed to be true. Assignment will result in the two having the same value. No conversion function is required. You're chasing a phantom problem that doesn't exist because when you examine the float in the watch window it displays the value with less precision than it displays the double.

    You're wasting your time :-)
    Last edited by Omnius; 10-08-2003 at 09:18 AM.

  14. #14
    Registered User Ward's Avatar
    Join Date
    Sep 2001
    Location
    Belgium
    Posts
    39
    Ok, then how will you do the following?

    You have two float values (250.84)
    You must add these values, but the calculation may only be done using an add-operation with only double values.

    Let's say I have the following function for that:
    Code:
    double doubleAdder(double d1, double d2)
    {
      return d1+d2;
    }
    How will you make the answer be exactly 501.68?
    Greetings.

  15. #15
    Registered User
    Join Date
    Sep 2003
    Posts
    135
    I think you need to understand floating point representation a little better than you do right now. (That's not meant to sound rude or insulting by the way, it's part of the learning process for anyone programming in the C language family - sooner or later everyone comes across this). I don't have a handy link to suggest, although I have come across some useful material over the years; if I find something I'll post it. (Anyone?)

    Most values can't be exactly represented using floating point representation, only an approximation. You suggest that you start out with two floating point values of 250.84, but you've probably actually got something very close but not exactly 250.84 - as we've seen from the earlier posts in the thread.

    If you add two values that are not precisely 250.84 then understandably the sum will not be precisely 501.68. Welcome to floating point representation ;-)

    The error can be magnified further if a calculation involves something more complex than simple addition.

    Your original need was to convert float to double and result in the two having the same value - this is achievable using plain old assignment.

    If you need something more complex, e.g. a representation that is accurate to two decimal places with associated operations then perhaps some form of fixed point representation would be of use to you, or some other (custom-built) data type, rather than floating point.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Conversion From C++ To C
    By dicon in forum C++ Programming
    Replies: 2
    Last Post: 06-10-2007, 02:54 PM
  2. need some help with last part of arrays
    By Lince in forum C Programming
    Replies: 3
    Last Post: 11-18-2006, 09:13 AM
  3. Replies: 14
    Last Post: 06-28-2006, 01:58 AM
  4. Unknown Math Issues.
    By Sir Andus in forum C++ Programming
    Replies: 1
    Last Post: 03-06-2006, 06:54 PM
  5. Im stuck....
    By dAzed in forum C++ Programming
    Replies: 8
    Last Post: 10-11-2004, 04:50 PM