Thread: Testpad for itoa()

  1. #76
    Registered User
    Join Date
    Jun 2010
    Posts
    182
    Quote Originally Posted by quzah View Post
    Code:
    void bar( char s[], long l )
    {
        char buf[ BUFSIZ ] = {0};
        char *p = buf, *q = s;
        size_t comma = 0, trail = 0;
        
        sprintf( buf, "%ld", l );
    
        if( *p == '-' )
            *q++ = *p++;
        
        for( trail = strlen( p ) % 3; trail > 0; trail-- )
        {
            *q++ = *p++;
        }
    
        for( comma = strlen( p ) / 3; comma > 0; comma-- )
        {
            *q++ = ',';
            *q++ = *p++;
            *q++ = *p++;
            *q++ = *p++;
        }
    }

    Quzah.
    Hi quzah. Did you test it? how long it is going to take for the
    whole range of integers? Did you see my previous post?

    The use of:
    Code:
        sprintf( buf, "%ld", l );
    makes me think it is going to be quite slow.

    Edit: the simple sprintf() cycle takes:
    Code:
                        Testing version : Sprintf()
                     ------------------------------------
     Testing on Intel Core 2 Duo E6600 2.4 Ghz // IA32
     OS = Windows 7 Ultimate 64 bit  --  Compiler = Pelles C 6.00.4
    
     Testing the numbers from: 0  to  100,000,000
    
     The generating process has taken:  43,633 ms  to perform the whole cycle
    using this code:

    Code:
         for (x = 0; x < times; x++){
             sprintf( buffer, "%ld", x );
         } // end x for
    quite slow indeed for testing 4 billion combinations it is going to take
    more or less 20' by itself.
    Last edited by frktons; 07-13-2010 at 03:38 PM.

  2. #77
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    I wasn't doing it for speed. I was just having fun.

    There's also two calls to strlen which could be reduced to one, or you could modify the sprintf line to count its length for you using the n modifier:
    Code:
    sprintf( buf, "%ld%n", l, &length );
    Edit: Your testpad fails to clean the buffer it passes between calls.


    Quzah.
    Last edited by quzah; 07-13-2010 at 03:37 PM.
    Hope is the first step on the road to disappointment.

  3. #78
    Registered User
    Join Date
    Jun 2010
    Posts
    182
    Quote Originally Posted by quzah View Post
    I wasn't doing it for speed. I was just having fun.

    There's also two calls to strlen which could be reduced to one, or you could modify the sprintf line to count its length for you using the n modifier:
    Code:
    sprintf( buf, "%ld%n", l, &length );
    Edit: Your testpad fails to clean the buffer it passes between calls.


    Quzah.
    Good to see that you're having fun. Enjoy.

    The cleaning is supposed to be done inside the function, when
    itoas() will be completed. Or not?
    Last edited by frktons; 07-13-2010 at 03:43 PM.

  4. #79
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Code:
    void itoas( long l, char s[] )
    {
        char buf[ BUFSIZ ] = {0};
        char *p = buf, *q = s;
        size_t comma = 0, trail = 0;
        int len;
        
        sprintf( buf, "%ld%n", l, &len );
    
        if( *p == '-' )
        {
            *q++ = *p++;
            len--;
        }
    
        for( trail = len % 3; trail > 0; trail-- )
        {
            *q++ = *p++;
        }
    
        for( comma = len / 3; comma > 0; comma-- )
        {
            if( len != 3 )
                *q++ = ',';
            *q++ = *p++;
            *q++ = *p++;
            *q++ = *p++;
        }
        *q = '\0';
    }
    It doesn't seem to change its speed by much if I get rid of strlen and use the built in %n. (The if check fixes an issue with three digit numbers getting a comma.) But this was really just about making a simple version, not for speed.

    Quzah.
    Hope is the first step on the road to disappointment.

  5. #80
    Registered User
    Join Date
    Jun 2010
    Posts
    182
    Quote Originally Posted by quzah View Post
    Code:
    void itoas( long l, char s[] )
    {
        char buf[ BUFSIZ ] = {0};
        char *p = buf, *q = s;
        size_t comma = 0, trail = 0;
        int len;
        
        sprintf( buf, "%ld%n", l, &len );
    
        if( *p == '-' )
        {
            *q++ = *p++;
            len--;
        }
    
        for( trail = len % 3; trail > 0; trail-- )
        {
            *q++ = *p++;
        }
    
        for( comma = len / 3; comma > 0; comma-- )
        {
            if( len != 3 )
                *q++ = ',';
            *q++ = *p++;
            *q++ = *p++;
            *q++ = *p++;
        }
        *q = '\0';
    }
    It doesn't seem to change its speed by much if I get rid of strlen and use the built in %n. (The if check fixes an issue with three digit numbers getting a comma.) But this was really just about making a simple version, not for speed.

    Quzah.
    In order to test 4.2 billion combinations I prefer to develop a
    faster one, but your version is good for didactical purposes

  6. #81
    Registered User
    Join Date
    Jun 2010
    Posts
    182
    Well, now we have a 50% test that confirms iMalc's itoas() is
    doing a good job.
    I've generated all the strings from "0" to "2,147,483,647" and compared
    the strings itoas() builds with mine.
    Everything looks just fine for positive numbers. I've still to test the
    negative ones, but something tells me it'll be OK as well.
    The version I've developed if quite fast, and takes some 2' for generating,
    formatting and comparing over 2 billion strings:

    Code:
                        Testing version : Nested_chars
                     ------------------------------------
     Testing on Intel Core 2 Duo E6600 2.4 Ghz // IA32
     OS = Windows 7 Ultimate 64 bit  --  Compiler = Pelles C 6.00.4
    
     Testing the numbers from: 0  to  2,147,483,647
    
     The generating/comparing process has taken:
     126,657 ms  to perform the whole cycle
    I'll attach the source code after some cleaning. and some sleep.

    Edit: better to attach the program now, and have your comments,
    I could have made errors that I don't realize at the moment.

    Let me know if you find something wrong, except for the goto
    Last edited by frktons; 07-13-2010 at 08:10 PM.

  7. #82
    Registered User
    Join Date
    Jun 2010
    Posts
    182
    And now that I am between positive [range] and negative [one] some
    replies to your posts. Forgive me, I'm late, but eventually I got time
    to do it.


    Quote Originally Posted by iMalc View Post

    Because the multiplicative inverse I was using only allowed me to perform the simulated division on numbers up to 10000 but not 100000 you'll notice that I had to split the number up into three variables num1, num2, and num3, each being a most 9999. When the compiler performs the same multiplicative inverse trick it can use instructions that basically calculate the full 64-bit multiplication result and then just keep the highest 32-bits from that. This means that it doesn't need the extra step of breaking it up into num1, num2 and num3 first, possibly freeing up some registers as well as avoiding the two divides I had.
    However, on the plus side the code I wrote may end up around the same size overall since the constants used in the machine code are smaller than the constants the compiler would have used since it can't assume the numbers are below 10000, and it may generate slightly more code when doing the same thing itself, to make sure it doesn't overflow a 64-bit multiplication as it may in some cases if you look at the article I linked earlier.

    It probably helps to think of it in several parts:
    • Loop unrolling.
    • Multiplicative inverse to perform division.
    • Reducing total branch count by reordering tests.
    • Pointer arithmetic instead of array indexing.
    • Lookup tables.


    I did make a version which had the same branching scheme that reduced the number of gotos to two, just to see how low I could get it, but I'm actually going to have to admit that the version with gotos was clearer as code wasnt nested about ten levels deep.
    Thanks a lot Malcom, when I reach stage 0.8 [stable and full optionals itoas()]
    I'll take care of these nicely described steps.

    Quote Originally Posted by cyberfish View Post
    @frktons:

    Your machine is 64-bit. There is no "partial" 64-bit mode. If you are running a 64-bit OS, the CPU is running in 64-bit mode.

    The C2D is 64-bit capable.

    And I've decided to scrap my version. It's nothing special, just good old lookup tables.

    Also, the C standard only guarantees a minimum RAND_MAX of 2^16. So it's possible (and very likely) that it's only generating numbers between ~-32000 to 32000. To make sure you have the full range, do something like,
    Code:
    int random_number = (rand() << 16) ^ rand();
    And you should really exclude the random number generation time from the timing, since it would depend on the implementation (different compilers implement rand() differently. Some go for higher quality random numbers, some go for speed).

    srand(0) does NOT make sure we are using the same random numbers. Using the same seed and guarantees you will get the same sequence FROM THE SAME IMPLEMENTATION. But that probably doesn't matter if iterations is big enough.
    My machine is IA32 CPU architecture, capable of running X64 code,
    but it hasn't the full X64 architecture and features, like 64 bit registers.
    I'd be very pleased it you tested the various versions on your Linux with
    GCC and X64 machine and give us some reports.

    Well, in modern machine I think random numbers are 32 bit, at least
    on mine. 16 bit stuff is really and oldy actually.
    I'm sorry you didn't complete your version, an old good lookup table
    is really fine indeed, quite fast and more beginner friendly.
    For the finer details of performance testing, I think we can be satisfied
    enough about what we are doing, without going into too much complexity.

    @Elysia:

    regarding super-fast ultra optimized vs2010 compiler I was obviously
    joking, no offence intended.

    @testing:

    I decided to go the long hard way, thoroughly testing the whole integer range.
    We have capable machines, so let's use them
    Last edited by frktons; 07-14-2010 at 04:00 AM.

  8. #83
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    If it can run 64 bit code, it has all 64 bit features. 64 bit code uses 64 bit instructions and registers. There is no "partial" 64 bit.

    16 bit random numbers have nothing to do with machine width. Just means the algorithm only generates numbers in the 16 bit range, which is very common (check your RAND_MAX).

  9. #84
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    For the finer details of performance testing, I think we can be satisfied
    enough about what we are doing, without going into too much complexity.
    Imagine if the rand() function takes 90% of the time, and the itoa function takes 10%. Do you think the test still means anything? You'll have to exclude the time, or prove that the time rand() takes is negligible (through profiling).

    Testing is tricky business. It only gives you a number, but not what the number means. It will never tell you "the answer is 42, but that doesn't mean anything, because ...".

    It's very easy to draw all kinds of conclusions from invalid tests that really don't mean anything.

    For example, if the rand() takes 90% of the time, and itoa takes 10%.

    Imagine you have 2 programs that run in 95s and 100s.

    Without the knowledge, you would be inclined to draw the conclusion that they are about as fast, within experimental error.

    In reality, one is twice as fast as another.

    Testing is not just throwing a loop together and time it. Sure, it will give you a number. But what does the number mean? There are many many factors involved, some of which can make your results completely useless. It gets even more complicated when optimizing compilers are involved.

    That's why performance testing is not really for beginners. Without taking MANY things into account, you'll just be chasing your own tails, trying to minimize the illusionary number that will have nothing to do with real world performance.

    All tests have errors. But that doesn't mean we shouldn't try to minimize them. What good is your test if it has 200% error?

  10. #85
    Registered User
    Join Date
    Jun 2010
    Posts
    182
    Quote Originally Posted by cyberfish View Post
    If it can run 64 bit code, it has all 64 bit features. 64 bit code uses 64 bit instructions and registers. There is no "partial" 64 bit.

    16 bit random numbers have nothing to do with machine width. Just means the algorithm only generates numbers in the 16 bit range, which is very common (check your RAND_MAX).
    I'm afraid Intel manuals would disagree with you, but this is not very
    important. I'd be more interested in your tests, so we could have something to speculate
    upon, as we did with the 0 ms performance optimization.

    In stdlib.h I found:
    Code:
    #define RAND_MAX  0x3fffffff
    I think it'd be bigger than 16 bit value, it looks like a 32 bit value.

    Regarding performance tests I'm aware it is a tricky matter so, as I said,
    we're not going into too much complexity and stay satisfied with what we
    got till now.
    By the way I'm not using random numbers anymore. I performed that test to
    be sure GCC was playing a nasty trick about optimization far superior than any
    human could imagine
    For the time being I'm testing the whole range of integers, and that is a tricky job
    as well

    If you feel like to do it, please try the last code I posted and tell me if you find
    any error, I'm sure I missed something, and what performance you get on your
    system.

    Enjoy
    Last edited by frktons; 07-14-2010 at 11:09 AM.

  11. #86
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I'm afraid Intel manuals would disagree with you, but this is not very
    important
    Which part exactly?

    C2Ds can run in 32-bit or 64-bit mode. When in 64-bit mode, it's 100% real and complete 64-bit. How can 64-bit code possibly run if the CPU does not have 64-bit registers?

    I think it'd be bigger than 16 bit value, it looks like a 32 bit value.
    Looks more like a 31-bit value to me . But that doesn't matter. The C standard only guarantees 16-bit, so to be standard compliant, you still need to do something like
    Code:
    int random_number = (rand() << 16) ^ rand();
    Otherwise, for example, an implementation can appear to be much faster if it only generates 16-bit numbers, because they are much shorter.

    I'll run the code tonight.

    Regarding performance tests I'm aware it is a tricky matter so, as I said,
    we're not going into too much complexity and stay satisfied with what we
    got till now.
    That is fair. All tests have to stop somewhere. Now you just need to measure the error. Otherwise you don't know if the numbers you are getting have any meaning.

    The easiest way would be to run your code under a profiler, and see how much time is spent in each function. But then that gets complicated with function inlining.

    I'm fairly sure Pelles C doesn't have a profiler since I believe it's a fairly recent thing, so you'll have to switch to another implementation to do that. And then of course, the results cannot be carried back to your Pelles C version, because it would have a different implementation of rand()...

    Visual Studio has a profiler in one of the most expensive editions. Or you can use GCC with gprof.

  12. #87
    Registered User
    Join Date
    Jun 2010
    Posts
    182
    OK, I checked the specifics of my processor, you are right it is considered
    a 64 bit architecture. I'm really happy with that. Thanks for drawing my attention
    to it.

    Regarding random numbers range, when I use the RAND() I get 32 bit numbers, so
    my compiler is not producing standard random numbers

    I haven't fully used all the features Pelles'C compiler has, I didn't even know
    it has a 64 bit version and it can transform a project from 32 to 64 bit.

    It has got a lot of nice things, but not really sure it has a profiler as well.

    I'm at the beginning of my learning path so I'm still more concerned about learning
    C syntax and standards than all the ide features I get in my compiler.

    At due time I hope I'll be able to use a profiler as well.

  13. #88
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by frktons View Post
    @testing:

    I decided to go the long hard way, thoroughly testing the whole integer range.
    We have capable machines, so let's use them
    Why? Seriously. Why? It's not like 10000 is going to work but 10001 is going to break. The only thing you need to test is zero, that each new digit range works, and minimum and maximum values. Anything else is really pointless. You're not using floating point numbers, so everything is predictable. It's not going to work for 333 but break for 334. It's just not.


    Quzah.
    Hope is the first step on the road to disappointment.

  14. #89
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I'm at the beginning of my learning path so I'm still more concerned about learning
    C syntax and standards than all the ide features I get in my compiler.
    Yeah that's why we have been suggesting moving on, and forget about all this performance stuff for now, because you can't really do much "valid" optimizations at this stage, without knowing a lot more.

    And pre-mature optimization is VERY evil.

    If you try to optimize all your functions like this, you will, after a few years time for a simple program, get a totally unreadable program that has more bugs than features.

    The first step of optimization is profiling. You don't want to optimize a function that only takes 0.1% of the run time. Even if you can make it take 0 time, your program is only 0.1% faster.

    If you take a function that takes 60% of the total time, and optimize it to be 10% faster, your program will be 6% faster overall.

    The itoa function will likely be the 0.1% function in any real code.

  15. #90
    Registered User
    Join Date
    Jun 2010
    Posts
    182
    Quote Originally Posted by quzah View Post
    Why? Seriously. Why? It's not like 10000 is going to work but 10001 is going to break. The only thing you need to test is zero, that each new digit range works, and minimum and maximum values. Anything else is really pointless. You're not using floating point numbers, so everything is predictable. It's not going to work for 333 but break for 334. It's just not.

    Quzah.
    So you think we have tested enough the itoas() function and we can go to
    the next step? Good to know, I was almost sure it is fine as well, but for learning
    purposes I was building checking routines and having fun.

    Anyway I think you are right, so if I code something else to check itoas() validity is
    more for fun than for real need.

    Quote Originally Posted by cyberfish View Post
    Yeah that's why we have been suggesting moving on, and forget about
    all this performance stuff for now, because you can't really do much "valid" optimizations
    at this stage, without knowing a lot more.

    And pre-mature optimization is VERY evil.

    If you try to optimize all your functions like this, you will,
    after a few years time for a simple program, get a totally unreadable
    program that has more bugs than features.

    The first step of optimization is profiling. You don't want to optimize a
    function that only takes 0.1% of the run time. Even if you can make
    it take 0 time, your program is only 0.1% faster.

    If you take a function that takes 60% of the total time, and optimize it
    to be 10% faster, your program will be 6% faster overall.

    The itoa function will likely be the 0.1% function in any real code.
    I agree. I am not trying at this stage to optimize any further, I'm just checking
    the validity and correctness of the whole, before going
    to the next step: ===> Look Here
    Last edited by frktons; 07-14-2010 at 01:36 PM.

Popular pages Recent additions subscribe to a feed