Thread: Optimizing large file reading

  1. #1
    ^ Read Backwards^
    Join Date
    Sep 2005
    Location
    Earth
    Posts
    282

    Optimizing large file reading

    Ok, so I have created a program that takes user input, checks some data files and does stuff.

    My only problem is, my data file is huge. It was nearly a 1 meg text file!
    So, I split it up into multiple files, that way it does not have to check every part of the files. Good.
    But some of them are still pretty large, and the way the program works in might have to check the files multiple times. All that works fine…but it takes so long.

    I am using standard ifsteam. But every time it gets the new line to check it is actually accessing the file. Is there a way to put the entire file into memory and then get each new line from memory; as I imagine that would be way way faster?

  2. #2
    Registered User
    Join Date
    Oct 2001
    Posts
    2,934
    One idea is to read the file into a vector of strings.
    Code:
       vector<string> V;
       string line;
    
       ifstream in(filename);
       while (getline(in, line))
          V.push_back(line);
    
       for (vector<string>::iterator it=V.begin(); it!=V.end(); it++)
          cout << *it << endl;
    Another idea is to use memory mapped I/O. There's been some examples of this posted before. I'm not familiar enough with the function calls to show you how. Try doing a board search if you're interested.

  3. #3
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Reading into a vector of strings only works if you have plenty of free memory (unsurprisingly, to read a 1MB file, the memory you need will probably exceed 1MB).

    What Shane (backwards) is describing is one of the classic trade-offs of handling files. Any operations with disk files take a long time compared with what you can do with them once they are in memory, but you need a lot of memory to work with them. And one of the issues of loading and then manipulating files in memory is that you need to write them back --- and that can sustain the performance hit or introduce a potential for lost data (eg if there is a power surge just at the point where you are trying to write data back to disk).

    Some operating systems do allow files to be mapped directly to memory (look up "memory mapped files"), which means the operating system will optimise your access to the file and handle most of the issues of ensuring the data gets to and from disk. The schemes to do this, however, are highly operating system dependent.

    If there is any structure in the files (eg one block at about the 500K mark is always being manipulated, and no other) it may be possible to get away with just keeping a part of the file in memory and manipulating it there.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. File transfer- the file sometimes not full transferred
    By shu_fei86 in forum C# Programming
    Replies: 13
    Last Post: 03-13-2009, 12:44 PM
  2. Need Help Fixing My C Program. Deals with File I/O
    By Matus in forum C Programming
    Replies: 7
    Last Post: 04-29-2008, 07:51 PM
  3. Possible circular definition with singleton objects
    By techrolla in forum C++ Programming
    Replies: 3
    Last Post: 12-26-2004, 10:46 AM
  4. Simple File encryption
    By caroundw5h in forum C Programming
    Replies: 2
    Last Post: 10-13-2004, 10:51 PM
  5. System
    By drdroid in forum C++ Programming
    Replies: 3
    Last Post: 06-28-2002, 10:12 PM