Thread: Testpad for itoa()

  1. #46
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by Elysia View Post
    I have no idea how iMalc implemented it using minus and multiplication, though.
    Here's a good link with an explanation of the technique. Hopefully not too math heavy.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  2. #47
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Here's an updated version which includes some small lookup tables for a futher slight improvement.
    I haven't updated the testbed at this stage though.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  3. #48
    Registered User
    Join Date
    Jun 2010
    Posts
    182
    Quote Originally Posted by iMalc View Post
    Here's an updated version which includes some small lookup tables for a futher slight improvement.
    I haven't updated the testbed at this stage though.
    Thanks iMalc.

    Can you explain why you are declaring:
    Code:
    int itoa(long snum, char s[])
    and using

    Code:
        return ps - s;
    ?

    What are these for? I rewrote them this way:

    Code:
    void itoa(long snum, char s[])
    and

    Code:
        return;
    Because we are not using a return value, so this pushing
    to the stack a return value is using CPU cycles for no purpose,
    at least in this context.

    By the way, the performance I get is not any better than the previous
    one, even 2-3 ms slower. (?)

    A last question, if you like to elaborate it:
    You said the algorithm you are applying here is not the best you
    can get, because compilers already do that even better.

    What would be the difference between your version and the version
    a smart compiler produces?

    Thanks again for your collaboration. The idea behind this optimization
    is very clever and I hope I'll be able to grasp it one day. :-)
    Last edited by frktons; 07-10-2010 at 02:23 AM.

  4. #49
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,412
    Quote Originally Posted by frktons
    What are these for?
    To return the length of the string.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #50
    Registered User
    Join Date
    Jun 2010
    Posts
    182
    Quote Originally Posted by laserlight View Post
    To return the length of the string.
    Thanks. It could be useful in another context. ;-)

  6. #51
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    By the way, the performance I get is not any better than the previous
    one, even 2-3 ms slower. (?)

    A last question, if you like to elaborate it:
    You said the algorithm you are applying here is not the best you
    can get, because compilers already do that even better.

    What would be the difference between your version and the version
    a smart compiler produces?
    Not to answer in iMalc's stead but I did try to warn you that you've hit a performance plateau.

    And the compiler's doing instructions better is a positive thing.

    The fastest version is not meant to be a nearly unreadable set of instructions like you've been getting. Normally you would use math to do this, as iMalc and several people have, and minimize complicated logic. Then, you turn on the compiler's optimization switch (provided you understand what your options are), and the compiler will do things in the underlying assembly, say, to get speed.

    iMalc has gone into detail about said optimization as it applies to his code.

    If you attempt to do such optimizations in plain code, without knowing what you are doing more than anyone usually does, you will just convince the compiler to use different assembly instructions for the operation, which may or may not be faster.

  7. #52
    Registered User
    Join Date
    Jun 2010
    Posts
    182
    Quote Originally Posted by whiteflags View Post
    Not to answer in iMalc's stead but I did try to warn
    you that you've hit a performance plateau.

    And the compiler's doing instructions better is a positive thing.

    The fastest version is not meant to be a nearly unreadable set of instructions like you've been getting. Normally you would use math to do this, as iMalc and several people have, and minimize complicated logic. Then, you turn on the compiler's optimization switch (provided you understand what your options are), and the compiler will do things in the underlying assembly, say, to get speed.

    iMalc has gone into detail about said optimization as it applies to his code.

    If you attempt to do such optimizations in plain code, without knowing what you are doing more than anyone usually does, you will just convince the compiler to use different assembly instructions for the operation, which may or may not be faster.
    I ask because I've some doubts. When somebody posted a zero ms result
    versus 100-200 ms, my logic freaked out. So I asked some smart Assembly
    guys who just replied:"NO WAY, show us the Assembly produced by the
    compiler and we'll show you what nasty trick the compiler is playing".

    Now, with random numbers the miracle of "0 ms performance" is over.
    Those guys explained me what more or less the compiler was doing.

    The performance is not the main subject of this thread, we are testing for
    correctness the various posted solutions, so if we reached a performance plateau
    it doesn't really matter.

    I'm here to learn, or to have a look at what to learn in the future.
    Curiosity is not an unforgiveable sin. I just asked iMalc why he
    is not happy about applying the math that he showed us.

    I can understand your warnings, they are pretty correct. Unfortunately the
    infinite debate among Aseembly guys and Compiler adepts is not going to
    stop. I'm not in that debate, and I don't want to be involved.

    Try to think this way: some 30 years ago I was playing with COBOL on
    big old mainframes. Now that I'm free from that hardware and I can play
    with new toys, I really would appreciate my freedom of doing a lot of mistakes
    I was not allowed to in that age.

    Bear with me my friend, let me learn it the hard way. :-)
    Last edited by frktons; 07-10-2010 at 04:27 AM.

  8. #53
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by frktons View Post
    Because we are not using a return value, so this pushing
    to the stack a return value is using CPU cycles for no purpose,
    at least in this context.
    You are being paranoid. The time taken there is negligible compared to the rest of the code. It probably won't impact the result by anything.

    As for the 0ms thingy, it's like I said. The compiler simply removes all of the code because it detects that it actually does nothing. You calculate a result and then throws it away. That's unnecessary because you're not storing it anywhere, so the compiler could easily remove the code and gain speed.
    So the code itself isn't super hyper ultra fast--it's gone. That's what I meant. iMalc's code allowed the compiler to figure out what it was doing enough to deem it unnecessary.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  9. #54
    Registered User
    Join Date
    Jun 2010
    Posts
    182
    Quote Originally Posted by Elysia View Post
    You are being paranoid. The time taken there is negligible compared to the rest of the code. It probably won't impact the result by anything.

    As for the 0ms thingy, it's like I said. The compiler simply removes all of the code because it detects that it actually does nothing. You calculate a result and then throws it away. That's unnecessary because you're not storing it anywhere, so the compiler could easily remove the code and gain speed.
    So the code itself isn't super hyper ultra fast--it's gone. That's what I meant. iMalc's code allowed the compiler to figure out what it was doing enough to deem it unnecessary.
    Thanks Elysia, if I correctly remember, the same nasty trick GCC was
    doing with my code as well, not only with iMalcs, as cyberfish revealed.

    About the first part of your comment I agree almost completely, except for
    the very first part. :-) Don't worry about my mental sanity, I can make it. ;-)

  10. #55
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by frktons View Post
    Thanks iMalc.

    Can you explain why you are declaring:
    Yup, string length as already noted my laserlight
    By the way, the performance I get is not any better than the previous
    one, even 2-3 ms slower. (?)
    I got a slight improvement using visual studio, but it certainly small enough that it's likely to vary from slightly better to slightly worse depending on compiler and platform etc. It trades a multiplication for a table lookup, which isn't always a win I guess.

    A last question, if you like to elaborate it:
    You said the algorithm you are applying here is not the best you
    can get, because compilers already do that even better.

    What would be the difference between your version and the version
    a smart compiler produces?
    Because the multiplicative inverse I was using only allowed me to perform the simulated division on numbers up to 10000 but not 100000 you'll notice that I had to split the number up into three variables num1, num2, and num3, each being a most 9999. When the compiler performs the same multiplicative inverse trick it can use instructions that basically calculate the full 64-bit multiplication result and then just keep the highest 32-bits from that. This means that it doesn't need the extra step of breaking it up into num1, num2 and num3 first, possibly freeing up some registers as well as avoiding the two divides I had.
    However, on the plus side the code I wrote may end up around the same size overall since the constants used in the machine code are smaller than the constants the compiler would have used since it can't assume the numbers are below 10000, and it may generate slightly more code when doing the same thing itself, to make sure it doesn't overflow a 64-bit multiplication as it may in some cases if you look at the article I linked earlier.

    Thanks again for your collaboration. The idea behind this optimization
    is very clever and I hope I'll be able to grasp it one day. :-)
    It probably helps to think of it in several parts:
    • Loop unrolling.
    • Multiplicative inverse to perform division.
    • Reducing total branch count by reordering tests.
    • Pointer arithmetic instead of array indexing.
    • Lookup tables.


    I did make a version which had the same branching scheme that reduced the number of gotos to two, just to see how low I could get it, but I'm actually going to have to admit that the version with gotos was clearer as code wasnt nested about ten levels deep.

    Quote Originally Posted by Elysia View Post
    So the code itself isn't super hyper ultra fast--it's gone. That's what I meant. iMalc's code allowed the compiler to figure out what it was doing enough to deem it unnecessary.
    Which is pretty impressive of the compiler if you ask me! It presumably would have to had determined that there were
    no calls to other functions,
    no accessing globals,
    only writing to an array that was never read from,
    and not using the return value.
    Violating perhaps any one of those would probably have foiled its efforts.
    Last edited by iMalc; 07-10-2010 at 02:34 PM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  11. #56
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    @frktons:

    Your machine is 64-bit. There is no "partial" 64-bit mode. If you are running a 64-bit OS, the CPU is running in 64-bit mode.

    The C2D is 64-bit capable.

    And I've decided to scrap my version. It's nothing special, just good old lookup tables.

    Also, the C standard only guarantees a minimum RAND_MAX of 2^16. So it's possible (and very likely) that it's only generating numbers between ~-32000 to 32000. To make sure you have the full range, do something like,
    Code:
    int random_number = (rand() << 16) ^ rand();
    And you should really exclude the random number generation time from the timing, since it would depend on the implementation (different compilers implement rand() differently. Some go for higher quality random numbers, some go for speed).

    srand(0) does NOT make sure we are using the same random numbers. Using the same seed and guarantees you will get the same sequence FROM THE SAME IMPLEMENTATION. But that probably doesn't matter if iterations is big enough.

  12. #57
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Random numbers are not very good for measuring performance, since we get different numbers a lot of the times.
    But they are better at testing and making sure we try a greater range of numbers and stop the compiler from assuming the input will only one kind of number.
    Using srand(NULL) is better than not using random numbers.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  13. #58
    Registered User
    Join Date
    Jun 2009
    Posts
    486
    Why don't you just use a set distributions of numbers that covered the whole range?

    Code:
    (for i=0;i<maximum;i+=maximum/num_iterations)
         format i
    sort of thing

  14. #59
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Quote Originally Posted by KBriggs View Post
    Why don't you just use a set distributions of numbers that covered the whole range?

    Code:
    (for i=0;i<maximum;i+=maximum/num_iterations)
         format i
    sort of thing
    If you have a large lookup table, for example, that won't give you accurate memory performance because most of the digits will stay the same.

  15. #60
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Not only that, but if maximum is 0x7FFFFFFF then you cound easily overflow.
    How about a random stepsize that gradually increases over time?
    Code:
    long stepsize = 0;
    (for long i=0x80000000; i <= 0x7FFFFFFF-stepsize; i += stepsize))
    {
        itoa(i, buffer);
        stepsize = 1 + rand() % (((((unsigned long)i)>>7)+1);
    }
    Last edited by iMalc; 07-11-2010 at 01:37 PM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed