Thread: What is the Fastest Printing to the Console?

  1. #16
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Redirecting output to a file makes a big difference in KC's program, but none in mine:

    Code:
    KC's program: 1.2 seconds down from 15 seconds
      My program: 2.1 seconds unchanged writing to stderr or stdout console
        "    "    2.9 seconds writing to stdout, redirected to a file
    This is with a 25 line vertical console window. A full sized console window will slow down the program, so I test them all with the same 25 line size window.

  2. #17
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I wasn't going to put this program over to the test, because it seemed way to slow, but since it's testing fast, I took the liberty of submitting it.

    It returned a wrong answer however, before the test was done. I submitted it twice, but received the same result.

    I have a testing function for all primes less than the 5,000,000th prime, so I'll see which answer is incorrect.

    I put KC's forum handle on here, but had to submit under my name, at the test site. As you can tell, I have a REAL eye for color, in Paint:
    Attached Images Attached Images What is the Fastest Printing to the Console?-knfromnconspoj-png 

  3. #18
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    The problem was just an extra line of print regarding the time. I removed it, and resubmitted it and it passed! Well done, KC!
    Attached Images Attached Images What is the Fastest Printing to the Console?-kcfromnc2-png 

  4. #19
    Registered User
    Join Date
    Mar 2009
    Posts
    344
    I'm using gcc -O2 -mtune=native. Obviously I'm redirecting the output as Salem's doing, since that's what the computer doing the judging will be doing as well.

    Code:
    $ time ./sphere_sieve.exe < sphere_sieve.txt | head
    
    0.249000 (time to sieve)
    0.140000 (time to init array which maps from prime count to prime value)
    2
    29
    541
    7919
    104729
    1299709
    15485863
    
    real    0m0.608s
    user    0m0.436s
    sys     0m0.169s
    Just shows how slow a P3 is compared to a modern "outdated" laptop...

    There's more room for optimization - you can eliminate another factor of 2-3(?) from the sieve array if you only track 6k+1/6k+5 values instead of all the odd numbers. And the indexing into the sieve array is about as inefficient as possible but good enough as a teaching tool. You can do lazy initialization of the prime count->value mapping. Plus there's no reason to free sieve, and no reason to call clock() if you're not printing it (not that these will make a huge difference...).
    Last edited by KCfromNC; 11-15-2011 at 09:39 AM.

  5. #20
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Yes, I removed time.h and all associated lines of code, before re-submitting it.

    The testing rig has been described as a battery powered abacus.

    I'm working through a paper by Sorenson on the Sieve of Eratosthenes, which includes pseudo code on implementing the "wheel". Hopefully, I'll have my program "rolling along on wheels", soon, and that will enable it to pass the challenge - but we shall see.

    In your compiler, can you check how fwrite() is implemented?

    In Pelles C, fwrite() calls fputc() for every single char, and the printing of the primes using your program, seems much too slow. I suspect that is why it's taking so long (15 seconds in my i7 (at 3.5GHz) ). In Turbo C, fwrite was quick, (at least, I thought it was), but here, it seems quite slow. Hmmmm...

    Lots to study in your program, I've barely scratched the surface yet.

  6. #21
    'Allo, 'Allo, Allo
    Join Date
    Apr 2008
    Posts
    639
    If you're trying to optimize stuff, why are you using Pelles C? It's value added stuff is fine, but when you've got the team efforts of VC, GCC, and Clang also available for free, its code gen quality is below the current par (c.f. its current problems with simple code).

    That implementation of fwrite also sounds hideous if it doesn't batch the chars in a buffer and WriteConsole/WriteFile in bulk. If it does it one-by-one then there's your perf problem, every (and I mean every) operation on a console in Windows requires an IPC operation.

  7. #22
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by adeyblue View Post
    If you're trying to optimize stuff, why are you using Pelles C? It's value added stuff is fine, but when you've got the team efforts of VC, GCC, and Clang also available for free, its code gen quality is below the current par (c.f. its current problems with simple code).

    That implementation of fwrite also sounds hideous if it doesn't batch the chars in a buffer and WriteConsole/WriteFile in bulk. If it does it one-by-one then there's your perf problem, every (and I mean every) operation on a console in Windows requires an IPC operation.
    I'm trying to optimize a prime number generator, that will meet the testing requirements of this challenge:
    http://www.spoj.pl/problems/TDKPRIME/

    It's no piece of cake, I'll tell you!

    I don't use Pelles C, except to write the code, and for my own testing. The actual test is done using GCC, right on the testing rig (which is a PIII cpu, running an abacus, on batteries (maybe hamsters in a wheel).


    I'm using Pelles C because it's well suited to me, who used Turbo C, until last year. Yes, I know!!

    I believe it is hideous, and that's why I asked this question. This is what Pelles C help, says:

    Description:
    The fwrite function writes, from the array pointed to by src, up to num elements whose size is specified by size, to the stream pointed to by stream. For each object, size calls are made to the fputc function, taking the values (in order) from an array of unsigned char exactly overlaying the object. The file position indicator for the stream is advanced by the number of characters successfully written. If the file is open in text mode, LF will be translated to CR-LF. The translation will not affect the return value. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate.
    When I read:
    "For each object, size calls are made to the fputc function,..."

    I nearly fainted, but I'm not sure how other compilers implement fwrite().

    All I know is, KC's program should RIP right through my test on an overclocked i7, (in less than 2 seconds), and it LOAFS, instead. It should be 15 times faster, (just an estimate).

    I will check with the experts on the Pelles C forum.
    Last edited by Adak; 11-15-2011 at 04:57 PM.

  8. #23
    Registered User
    Join Date
    Mar 2009
    Posts
    344
    Quote Originally Posted by Adak View Post
    I'm working through a paper by Sorenson on the Sieve of Eratosthenes, which includes pseudo code on implementing the "wheel". Hopefully, I'll have my program "rolling along on wheels", soon, and that will enable it to pass the challenge - but we shall see.
    I tried an implementation of a similar algorithm, and on numbers this small it was slower than my current approach. But I didn't spend too much time on it so I'm not sure what the issue was.

    In your compiler, can you check how fwrite() is implemented?
    It looks like it buffers writes which are smaller than a block size and passes larger writes directly to the system write call - usskim / fwrite. Which tells me I should recode to use read()/write() instead of fread() and fwrite(). This might help on your setup as well - although if the compiler is brain-dead enough to call fputc() for each character in fwrite() I'm not optimistic about it being any smarter for write().

    Lots to study in your program, I've barely scratched the surface yet.
    I have one that's about 33% quicker already working. I have it pre-sieving multiples of 3 now and cleaned up the sieve indexing quite a bit. It's getting near the point of diminishing returns...

  9. #24
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Well that's what you get when you try to fwrite() to a FILE* stream opened in text mode.

    It has NO choice but to write out a character at a time in order to be able to do the necessary \n to \r\n translations required of text mode on your platform.

    Try a binary file, and your own \n to \r\n translations "in situ", as you create your buffer.
    Though re-opening stdout as a binary file might be somewhat tricky in itself.

    Or you run on a real OS like KCfromNC does, and forget about the difference between text and binary modes.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  10. #25
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I did create a version of my program that uses fwrite(), but of course, it was tested on the Windows 7 OS, and it was slower - so I dropped that idea.

    Since the testing rig uses Linux however, I'll have to revive it. I have a few problems with Linux versions that I've tried:

    1) Many Linux distro's have made stunningly stupid mistakes, which they will fix in their NEXT release. You can solve it yourself if you just follow this 10 page summary, and it's your lucky day.

    I recall installing a version of Ubuntu, and following the "auto config" graphic of my HD. Ubuntu promptly formatted the wrong partition on the HD! OOPS! the graphic was backward on that distro! "We'll have to fix that in our next release"

    ext4 file system is a pile of dung. To prepare the data produced by a big work unit in Folding@Home, ext4 (now the default file system in Ubuntu), requires an extra hour and a half, over either ext3 or NTFS file systems.

    2) For some reason I don't understand, Linux stops your overclocking settings in the BIOS, and returns your PC to it's default speed. I thought it was just a Ubuntu goof up, but the head guy at Arch Linux stated it was not.

    I run all my PC's except one, overclocked 24/7, working on various research projects (World Community Grid, Folding@Home, Rosetta, and yes, even SETI on occasion.) Stock speed is just not an option.

    I may have to give Arch a second look though. I really like their philosophy, and they don't seem to have the Linux VD (verbosity disease), nearly as badly as Ubuntu, and some other distro's.

    I've heard good things about CentOS and Debian, also.

  11. #26
    Registered User
    Join Date
    Aug 2008
    Location
    Belgrade, Serbia
    Posts
    163
    Give Arch a chance, you most certainly won't regret it.
    Vanity of vanities, saith the Preacher, vanity of vanities; all is vanity.
    What profit hath a man of all his labour which he taketh under the sun?
    All the rivers run into the sea; yet the sea is not full; unto the place from whence the rivers come, thither they return again.
    For in much wisdom is much grief: and he that increaseth knowledge increaseth sorrow.

  12. #27
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by Adak View Post
    Since the testing rig uses Linux however, I'll have to revive it. I have a few problems with Linux versions that I've tried:

    1) Many Linux distro's have made stunningly stupid mistakes, which they will fix in their NEXT release. You can solve it yourself if you just follow this 10 page summary, and it's your lucky day.

    I recall installing a version of Ubuntu, and following the "auto config" graphic of my HD. Ubuntu promptly formatted the wrong partition on the HD! OOPS! the graphic was backward on that distro! "We'll have to fix that in our next release"
    Those kinds of problems aren't very widespread in this decade...
    ext4 file system is a pile of dung. To prepare the data produced by a big work unit in Folding@Home, ext4 (now the default file system in Ubuntu), requires an extra hour and a half, over either ext3 or NTFS file systems.
    That was because you did not turn off journaling when you needed good performance.
    2) For some reason I don't understand, Linux stops your overclocking settings in the BIOS, and returns your PC to it's default speed. I thought it was just a Ubuntu goof up, but the head guy at Arch Linux stated it was not.
    True... it does not...
    I use cpufrequtils to overclock my desktop and underclock my netbook.

    I may have to give Arch a second look though. I really like their philosophy, and they don't seem to have the Linux VD (verbosity disease), nearly as badly as Ubuntu, and some other distro's.
    What do you mean by that?


    Apologies for cluttering the fine ongoing discussion...please split this if the discussion is still hot.

  13. #28
    Registered User
    Join Date
    Mar 2009
    Posts
    344
    Quote Originally Posted by Salem View Post
    It has NO choice but to write out a character at a time in order to be able to do the necessary \n to \r\n translations required of text mode on your platform.
    I think the glibc code I linked to handles this without having to resort to printing one character at a time. Certainly scanning the output string in memory is quicker than repeated system calls.

    Or you run on a real OS like KCfromNC does, and forget about the difference between text and binary modes.
    I'm running under windows as well, just using more mature tools.

  14. #29
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by Adak
    I may have to give Arch a second look though. I really like their philosophy, and they don't seem to have the Linux VD (verbosity disease), nearly as badly as Ubuntu, and some other distro's.
    Quote Originally Posted by manasij
    What do you mean by that?
    I mean when you go to a distro's website to get some help with a problem, the answers are all usually right there, waiting for you:

    In a 15 page condensed version that someone has put together, out of the kindness of their heart. If not, there will be 30 threads on it, each with 25 replies, and each one having a part of the answer you need, It's common to spend an hour figuring out what to do, for a simple problem.

    I assure you, in Linux, excessive verbosity in replies to a query, are excessively common! I want to USE my PC's operating system, not become a slave to it's idiosyncrasies.

    I thought "journaling" was similar to "indexing" in Windows, and ran only in the background. I did not have this problem because I never trusted ext4. I don't change file system easily. Others did have this problem, however.

    Arch Linux is the only distro I know of that actually puts a premium on being simple, clear, and concise. A bit like good C.

  15. #30
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by Adak View Post
    I mean when you go to a distro's website to get some help with a problem, the answers are all usually right there, waiting for you:

    In a 15 page condensed version that someone has put together, out of the kindness of their heart. If not, there will be 30 threads on it, each with 25 replies, and each one having a part of the answer you need, It's common to spend an hour figuring out what to do, for a simple problem.

    I assure you, in Linux, excessive verbosity in replies to a query, are excessively common! I want to USE my PC's operating system, not become a slave to it's idiosyncrasies.

    I thought "journaling" was similar to "indexing" in Windows, and ran only in the background. I did not have this problem because I never trusted ext4. I don't change file system easily. Others did have this problem, however.

    Arch Linux is the only distro I know of that actually puts a premium on being simple, clear, and concise. A bit like good C.
    Well... your explanation of verbosity would be startlingly similar to someone new to learning C ..and stumbling into this (or similar) forums with a question a little off the mark !

    AFAIK...journaling is a little different from indexing... as indexing makes a sort of database to be loaded in memory about what to find where ...but journaling actually logs all your file operations 'virtually' and the performs the actual task on the disk at its leisure...(That makes error recovery quite trivial...but potentially affects performance)

    I agree about Arch being simple....but getting it running to your taste from the barebones provided is a little tedious.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Printing text in console.
    By milky in forum C Programming
    Replies: 0
    Last Post: 11-22-2009, 02:12 PM
  2. Printing Unicode to console
    By jw232 in forum Windows Programming
    Replies: 7
    Last Post: 02-22-2009, 11:41 PM
  3. Printing on the console
    By balu14u in forum C Programming
    Replies: 3
    Last Post: 04-02-2005, 11:40 PM
  4. Bitmap Printing In Console C++
    By LostNotFound in forum Windows Programming
    Replies: 1
    Last Post: 03-10-2003, 08:14 AM
  5. Printing In Console C++
    By LostNotFound in forum C++ Programming
    Replies: 1
    Last Post: 02-15-2003, 05:46 PM