Thread: What is the Fastest Printing to the Console?

  1. #1
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868

    What is the Fastest Printing to the Console?

    I have a lot of int's that I need to print super fast, to the console. Google hasn't been helpful.

    I've found increasing the number of integers printed, per call to printf(), is helpful, instead of printing one number per time through the printing loop.

    Instead of using:
    Code:
    for(i=0;i<BigNum;i++)
       prinf("%d", big[i]);
    It's faster to:
    Code:
    for(i=0;i<BigNum;i++)
      printf("%d\n%d\n%d\n, ... ", big[i],[big[i+1],big[i+2]...);
    I tried fwrite, but it wants char's or char arrays, not int's. By the time I transfer the int's all over to a char array, and print it, it's slower, not faster. (In very limited testing).

    I also tried opening stdout using low level I/O. (no luck so far). Is that the way to go? Oddly, I have never tried speeding up printing reams of unreadable data, to the console before.

    First time for everything, I guess. Thank you.

  2. #2
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Honestly, I doubt your program is the bottleneck. I'd bet it's the console itself, so optimizing your code probably isn't going to gain much. Having said that:

    printf is going to be a bit slow because of parsing the format string and all that. You can look for implementations of the non-standard function itoa(), strip out anything you don't need and adapt it to use putchar directly or something. It should be a bit faster. You might try using setvbuf to see if you get any improvement with fully buffered IO, so you can print a large number of ints to the console before flushing.

    Low-level IO should be a hair faster than C's standard API, but again, not sure it would be worth it or that it will solve the bottleneck. You'd still have to convert an int to text using something like itoa() in that case.

  3. #3
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    I suggest trying fprintf to stderr before doing much more work.

    Tim S.

  4. #4
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    @Anduril: Thanks for the several suggestions. I have used itoa(), but it wound up being just a bit slower than the current fastest (multiple numbers per call to printf()).

    I didn't think to "hack" it, so that's a great idea to try. YOU are a dangerous coder, I can tell!

    The PC that will be running this, is on Linux, and using another compiler, so I'm not sure how far I can take a hack, but it will be fun to try.

    @Tim: I've tried every combination of printf() call you can imagine, I do believe: printf(); fprintf(stdout,), fprintf(stderr,), as well as versions of fwrite, fputc(), fputs()

    You're correct, fprintf(stdout, ) and fprintf(stderr, ) are the two champs, with identical time for printing 50K integers, at present. Unfortunately, I'm still unable to finish in the time needed.

    I should have been more detailed in my original post, but didn't want to make it too long. Any other idea's please don't hesitate, and thanks.

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Well, you need to compare
    Code:
    for(i=0;i<BigNum;i++)
       prinf("%d", big[i]);
    // with
    for(i=0;i<BigNum;i++)
       prinf("1234567");
    to figure out how much the formatting costs you.

    Then you should do the same with
    printf("1");
    vs
    printf("some long string padded to say 80 characters.....");


    But I find typically that console output is slugged in the driver somewhere for the benefit of human readers (and the console scrolling probably doesn't help either).

    myprogwithlargeoutput.exe
    can take minutes

    whereas
    myprogwithlargeoutput.exe > outputfile.txt
    takes only a fraction of that time.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Absolutely! One time I ran the program in question with the terminal window limited to just two rows vertically. The run was 300% faster!

    Unfortunately, the rig that will be running my program, is not my own, so I can't control the size of the terminal display window.

  7. #7
    Registered User
    Join Date
    Mar 2009
    Posts
    344
    If your goal is to scroll the output quicker than a human can read it, why print it at all?

  8. #8
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Well KCfromNC, someone asked for help on a programming challenge. I gave some tips - others gave some tips - but no one was able to meet the challenge.

    Here's the challenge:

    Your program must read (up to) 50,000 queries from the testing rig, the first number you read will be how many queries your program will be given. The next (and I believe it's always 50K queries), numbers will be requests for your program to answer back with the kth prime number. So:

    50000
    5
    10
    50000
    etc.

    Your program must respond with
    11
    29
    86028121
    etc.


    Source code is limited in size to < 10,000 bytes. A prime_gen type program with 80 files total, won't work here.

    The primes are all made up with the Sieve of Eratosthenes now, and on my PC, that takes just 0.9 seconds. That sounds great, doesn't it?

    Except when submitted to the testing rig, it exceeds the time limit of 10 seconds. The testing rig, I have now discovered, is a PIII Xenon (not the battery powered abacus I had begun to fear it was!)

    2.6 seconds is the total run-time on my PC, including all I/O. About 1 second calculation, and the rest is time needed to display all 50K answers.

    I have pared the calculating code down to what I believe is the least possible, so what's left to improve is just the time needed to display the answers. It's almost nothing except a small for loop for input, and a nested pair of for loops, for the marking up of the Sieve array. Lastly, there is a large fprintf(stderr, ...) statement, that prints out 250 numbers per call.

    I don't know how my program did for time on the testing rig, because the test is stopped after 10 seconds, with just the result "Time Limit Exceeded".

    More experienced challengers (who have also failed though), are saying to use the Segmented Sieve of Eratosthenes algorithm. That is next on the list of things to try, but it's a much more challenging algorithm. Basically, it looks to cut the workload by jumping over any large gaps between the queries, if they exist.

    This is the results of the testing on this program. The green rows indicate success, the orange colored rows are fails. Nearly everyone fails.

    Sphere Online Judge (SPOJ) - Status

    And this is the test description:
    Sphere Online Judge (SPOJ) - Problem TDKPRIME

    The hilarity begins here, so innocently:
    find the nth prime prime number - C | DaniWeb

    Generating prime numbers, how tough can it be?

  9. #9
    Registered User
    Join Date
    Mar 2009
    Posts
    344
    You can typically game these sorts of IO problems using fread @ the OS block size (typcially 4K) and than parsing the read string by hand in your code. By hand, I mean avoiding time consuming functions like atoi or sscanf and doing it yourself. Cutting out error checking and so on can save time.

    I'd assume you can do the same for the output. Create a huge character buffer than you fill in with a hand-coded loop (again, not sprintf() since it's too general purpose), then fwrite that out in one shot at the end of the code

    It's more gaming the system than optimizing, but at least it answers my question as to what the point is.

    The easiest way to do this - have you program read the input data and ftp it to a machine you have access to. Precompute the answers based on that input and then write a simple program to print the results. That will minimize the processing time and give you more time to waste in IO.
    Last edited by KCfromNC; 11-11-2011 at 10:08 AM.

  10. #10
    Registered User
    Join Date
    Mar 2009
    Posts
    344
    Something like this, perhaps?

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <time.h>
    #include <math.h>
    #include <limits.h>
    #include <ctype.h>
    
    /* Pack sieve array.  Don't encode even numbers, and start the array
     * with 3, the first non-even prime.  If you count 1 as prime, it can
     * be added as a special case like 2 below so it doesn't mess up the 
     * mapping.
     * Next step is to pack from 1 character per prime to 1 bit per prime.
     * Since longs are 32 bits, divide again by 32 to get the word index and mask
     * off the low 5 bits (0-31) to get the amount to shift to find the bit
     * which corresponds to a given prime.
     */
    #define WORD_FROM_VAL(x) (((x) - 3)/ 64)
    #define BIT_FROM_VAL(x)  ((((x) -3)/ 2) & 0x1f)
    
    #define PRIME_5_MILLION 86028121
    int arr[5000000];
    int size=PRIME_5_MILLION/(2 * sizeof(unsigned long) * CHAR_BIT);
    
    int main() {
       double start,end;
       int q;
       start=clock();
       int i,j,m=0;
       int sieve_end = floor(sqrt(PRIME_5_MILLION)+0.1);
    
       unsigned long *sieve = calloc(size, sizeof(unsigned long));
    
       /* Up to 50,000 numbers to test, each up to 7 digits plus \r\n (or \n depending
        * on OS, but don't worry too much about an extra 50KB) */
       char *input = malloc (50001 * 9 + 1);
       char *output = malloc (50000 * 10 + 1);
    
       int curr_prime;
       for (curr_prime = 3; curr_prime <= sieve_end; curr_prime += 2)
       {
          if (!(sieve[WORD_FROM_VAL(curr_prime)] & (1UL << BIT_FROM_VAL(curr_prime))))
          {
             for(j = curr_prime*curr_prime; j < PRIME_5_MILLION; j+=curr_prime*2) 
             { 
                sieve[WORD_FROM_VAL(j)] |= 1UL << BIT_FROM_VAL(j);
             }
          }
       }
       end=clock();
       printf("\n%f",(end-start)/CLOCKS_PER_SEC);
       start=clock();
       arr[m++] = 2; /* Special case for 2 */
       for (i=3; i<=PRIME_5_MILLION; i+=2) {
          if (!(sieve[WORD_FROM_VAL(i)] & (1UL << BIT_FROM_VAL(i))))
             arr[m++]=i;
       }
       free(sieve);
       end=clock();
       printf("\n%f\n",(end-start)/CLOCKS_PER_SEC);
    
       int rc = fread(input, 1, 50001*9+1, stdin);
       input[rc] = '\0';
    
       const char *inp = input;
       char *outp = output - 1;
       char *outp_saved = output;
       unsigned in_num;
       unsigned out_num;
       unsigned out_num2;
    
       /* Skip first number */
       while (!isdigit(*inp)) inp++;
       while ( isdigit(*inp)) inp++;
    
       while (*inp)
       {
          /* Skip CR-LF */
          while (*inp && !isdigit(*inp)) inp++;
    
          /* Convert string to unsigned int */
          for (in_num = 0;*inp && isdigit(*inp); inp += 1)
          {
             in_num *= 10;
             in_num += *inp - '0';
          }
          if (in_num)
          {
             /* Turn prime count into prime value */
             out_num = arr[in_num-1];
    
             /* Move output pointer ahead, print in reverse */
             for (out_num2 = out_num; out_num2; outp += 1)
                out_num2 /= 10;
    
             /* At end of space for number, add CR-LF pair */
             outp_saved = outp+1;
             *outp_saved = '\n';
    
             /* Print digits in reverse order from least to most significant
              * so when they're read normally order is correct */
             for ( ; out_num; out_num /= 10)
                *outp-- = (out_num % 10) + '0';
             outp = outp_saved;
          }
       }
       fwrite(output, 1, outp - output + 1, stdout);
      
       return 0;
    }

  11. #11
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    You had me rolling in the aisles!
    The easiest way to do this - have you program read the input data and ftp it to a machine you have access to. Precompute the answers based on that input and then write a simple program to print the results. That will minimize the processing time and give you more time to waste in IO.

    I'll have to study your program in detail. I'm sure that packing and shifting would help out. Most of those optimizations I have currently, but they aren't as snappy as what your code has.

    Right now, I'm implementing a "Wheel", which will further speed up the sieve loop (hopefully!). I'm now down to 2.1 seconds on my PC, so a bit more should be enough to get the run time down below ten seconds, on the P3 cpu based, testing rig.

    Interesting code, and well done! I look forward to studying it, thanks very much for the explanation and your time on this, KCfromNC.

  12. #12
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I'm not sure why, and quite surprised, but the program ran slowly.

    Here's a link to a test file similar to the one that the testing rig uses. It has one extra number (50,000) as the first number in the file, as per the test requirement.

    swoopshare - Download data.txt

    I use it in a command window:

    sieve < data.txt

    but I also use a timing program, so:

    ptime sieve < data.txt

    This is on an i7 cpu with Win 7 x64, using the Pelles C compiler (in 32 bit mode (it's default size), for compatibility with the P3 cpu of the testing rig).

  13. #13
    Registered User
    Join Date
    Mar 2009
    Posts
    344
    Not sure which program you meant. The one I posted ran in about 2/3rds of a second testing 50000 lines on my laptop. Did you enable optimization?

    Also note that a lot of sieving programs are optimized for testing primes way larger that this example. That may end up slowing things down rather than speeding them up.

  14. #14
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I was referring to your program.

    Run time is 15 seconds on an overclocked i7. All optimizations are on, of course.

    The unsigned longs are the same size as int's (4 bytes) on this PC, and compiler. Long long's are 8 bytes.

    What compiler are you using? I'm using Pelles C.

    I expected a fast time, so this was a big surprise.
    Attached Images Attached Images What is the Fastest Printing to the Console?-kcfromnc-png 

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    How long does it take to do
    myprog.exe > results.txt

    This will measure the best I/O speed.

    For your console, these things (and some others) could affect how long something takes
    - the size of the scrollback buffer
    - the amount of data already in the scrollback buffer
    - font and code page (for character translation, rendering lookup etc)
    These things all have to be done, but have nothing to do with your program, but still count as part of your programs execution time.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Printing text in console.
    By milky in forum C Programming
    Replies: 0
    Last Post: 11-22-2009, 02:12 PM
  2. Printing Unicode to console
    By jw232 in forum Windows Programming
    Replies: 7
    Last Post: 02-22-2009, 11:41 PM
  3. Printing on the console
    By balu14u in forum C Programming
    Replies: 3
    Last Post: 04-02-2005, 11:40 PM
  4. Bitmap Printing In Console C++
    By LostNotFound in forum Windows Programming
    Replies: 1
    Last Post: 03-10-2003, 08:14 AM
  5. Printing In Console C++
    By LostNotFound in forum C++ Programming
    Replies: 1
    Last Post: 02-15-2003, 05:46 PM