Thread: Best way to write a file

  1. #1
    Registered User
    Join Date
    Jul 2008
    Posts
    52

    Best way to write a file

    Hi everyone,

    I'm wondering if anyone knows the optimal way to write a very big tab-sepparated file with data that consists mostly of ints and floats. This ints and floats are really rows of data that will be input in a database. What I'm trying to do is to use a string to save like 500 rows and then I write to disk. It's rather slow and takes like 1.18 minutes to write all of this to a file. Anyone know a way to optimize better?

    Code:
      
     std::string to_file = "";
    
      for (int i=0; i<numRecords; ++i)  {    
        
        for (ssize_t fldidx = 0; fldidx <  hdr.Length(); ++fldidx)  {
           
          switch (field_type) {
          case UNDEFINED:
            continue;
            
          case STRING:
            continue;
            
          case CHAR: {
            continue;
          }
          case INT: {
            const int field_addr = field.mNumVal.i;
            char result[100];
            sprintf( result, "%d", field_addr );   
            to_file += result;
            break;
          }
            
          case FLOAT: {
            const float field_addr = field.mNumVal.f;
            char result[100];
            sprintf( result, "%f", field_addr );
            to_file +=  result;
            break;
          }
            
          case DOUBLE: {
            const double field_addr = field.mNumVal.d;
            char result[100];
            sprintf( result, "%f", field_addr );
            to_file += result;
            break; 
          }
            
          case LONG: {
            const long field_addr = field.mNumVal.l;
            char result[100];
            sprintf( result, "%d", field_addr );
            to_file += result;
    	break;	      
          }
          }
          
          if (fldidx != hdr.Length()-1)
            to_file += "\t";
          else 
            to_file += "\n";
    
         }
        if ( (i % bulk  == 0) || i == (numRecords-1) ) {
          fout << to_file;
          to_file = "";
        }

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Why not use the normal file output directly? Like this:
    Code:
    ...
          case INT: {
            fout << field.mNumVal.i;
            break;
          }
            
          case FLOAT: {
            fout << field.mNumVal.f;
            break;
          }
    ...
    Or have you already tried that?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Jun 2008
    Posts
    106
    Use HDF (hierarchial data format) lib! I just finished using it, its quite rapid. It also compresses your file.

  4. #4
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879
    I remember someone posting benchmark results for various methods of file writing, at one point. Bottom line was:

    ofstream slower than fwrite() *much* slower than direct API calls.

    I'm no expert, but I believe if you're hardcore enough, you can use unbuffered file I/O (link here) to make a huge difference in speed, assuming you're careful and get it right.

    Another thing to experiment with, would be to use SetEndOfFile() to allocate the entire file immediately, so that your successive write operations don't have to keep extending it. I haven't tried it, so I couldn't tell you if it actually works or not.
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  5. #5
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    I remember someone posting benchmark results for various methods of file writing, at one point. Bottom line was:

    ofstream slower than fwrite() *much* slower than direct API calls.
    Hmm. Did my own tests:
    Code:
    C:\...\Documents\Code\cs_ex\filespeed>filespeed stdio
    Doing stdio output...
    Std. I/O took 561ms.
    
    C:\...\Documents\Code\cs_ex\filespeed>filespeed ofstream
    Doing ofstream output...
    ofstream took 1014ms.
    
    C:\...\Documents\Code\cs_ex\filespeed>filespeed win32
    Doing Win32 output...
    ofstream took 951ms.
    All files have their handles closed before the time is stopped. Each test is run after the disk I/O from the previous settles, and each writes 100MB to disk. (Which is greater than most files that get written, and anything much bigger might run into disk I/O speeds, in which case whatever API you call will become irrelevant really quickly.) Listed times are the best out of several trials.

    Oddly, Win32 did not win. Also, if I told the other too to fflush() and .flush(), it did not make a difference. If I called FlushFileBuffers() on the Win32 handle, Windows waited until the data was truely flushed, ie - on disk! (My harddisk cannot write 100 MB in <2 sec)

    Code is attached, for those interested.

    Now, for reading at least, I have found that requesting more data at a time works better. Don't fread() 1 byte at a time. Additionally, serializing the data before the actual I/O (such as a call to sprintf() or something) takes some time, and might be a bottleneck. Your profiler is your friend.
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  6. #6
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879
    For interest's sake, I compiled it with FILE_FLAG_NO_BUFFERING on the win32 version. Maybe I did something funny, but it took roughly 30 times longer than any of the other tests So I guess disregard my previous post.

    P.S. I also noticed that each test varied hugely; for example, the fwrite() test went from 2.3s to ~9s between two tests, and similar results with win32. On average though, fwrite() was faster than win32 and win32 was faster than fstream.

    P.P.S. SetEndOfFile() does seem to improve performance somewhat:
    Code:
    SetFilePointer(hFile, OUTFILE_SIZE, NULL, FILE_BEGIN);
    SetEndOfFile(hFile);
    SetFilePointer(hFile, 0, NULL, FILE_BEGIN);
    Note that this may only help if the file doesn't already exist.
    Last edited by Hunter2; 08-02-2008 at 04:23 PM.
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  7. #7
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Code:
    Std. I/O took 561ms.
    ofstream took 1014ms.
    ofstream took 951ms.
    O_o

    Yea... you need a better C++ library implementation. The difference between the three should be negligible for such a small file.

    (You also need to fix the bug in your code that prints 'ofstream' for the Windows API case.)

    Soma

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. A development process
    By Noir in forum C Programming
    Replies: 37
    Last Post: 07-10-2011, 10:39 PM
  2. help with text input
    By Alphawaves in forum C Programming
    Replies: 8
    Last Post: 04-08-2007, 04:54 PM
  3. Game Pointer Trouble?
    By Drahcir in forum C Programming
    Replies: 8
    Last Post: 02-04-2006, 02:53 AM
  4. Encryption program
    By zeiffelz in forum C Programming
    Replies: 1
    Last Post: 06-15-2005, 03:39 AM
  5. archive format
    By Nor in forum A Brief History of Cprogramming.com
    Replies: 0
    Last Post: 08-05-2003, 07:01 PM