Thread: Reading large files (20-110MB) into memory

  1. #1
    Welcome to the real world
    Join Date
    Feb 2004
    Posts
    50

    Reading large files (20-110MB) into memory

    I was trying to think of efficient ways to read large files (20-110MB) into memory. I have a function where I must read an entire file, assign it to an unsigned char*, and then pass it to another function for evaluation. The char* must represent the contents of the file.

    Is there a way to efficiently do this, or is the best method to simply read the entire file in one chunk? Obviously there will be a performance hit if many files have to be read. Consider doing this for data over the count of 10GB...

    Any ideas are greatly appreciated.
    Last edited by OOPboredom; 02-29-2004 at 08:05 PM.

  2. #2
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    Well, if you must pass a pointer to char and that pointer to char must contain the entire contents of the file, there really isn't much you can do aside from rethink your design. Reading a 20-110MB file into memory should be done only as a last resort. If you can pass a FILE* to the function or make multiple calls with smaller blocks then you should do so.

    Concerning performance, if you must have the file in memory, taking it in blocks as large as the system handles efficiently until you have everything is really the best you can do. At these sizes, and considering your restrictions, you're looking at a startup performance hit no matter what solution you choose.
    My best code is written with the delete key.

  3. #3
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    >> Consider doing this for data over the count of 10GB...

    How many users do you think are going to have that much RAM to spare at any given moment? Besides that, most OS's place a limit on how much memory a single program can allocate. I agree with Prelude - just pass file handles around and process the data in smaller chunks. What sort of application is it, by the way?
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  4. #4
    Welcome to the real world
    Join Date
    Feb 2004
    Posts
    50
    According to the design specification I do not have any flexibility on how I pass the data. The char* must represent the exact contents of the file.

    The function that I pass the char* to only takes a char* and then computes a hash value from the value. I wish it were different or I would be able to cumulatively pass blocks of data, but I cannot. It simply looks like I'll have to read entire files in one chunk. This leads me to my next question.

    If I allocate space for the unsigned char pointer, would it be best to get the size of the file and allocate accordingly or should I read 512byte blocks and keep adding what I read to the end of the previously read data. It might be easier to just allocate one large chunk of memory, but what happens when I have to read a 100MB file and represent that file by a char*?
    Last edited by OOPboredom; 03-01-2004 at 04:51 PM.

  5. #5
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    >>The function that I pass the char* to only takes a char*
    Then how does it know how long the data is? You cannot assume a \0 terminated array in this scenario.

    >> I do not have any flexibility on how I pass the data. The char* must represent the exact contents of the file.
    Has anyone said you can't call this function more than once?

    >>what happens when I have to read a 100MB file and represent that file by a char*
    You're RAM starts to fill up, and the OS swaps out memory to disk, making the whole thing inefficient.
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

  6. #6
    Welcome to the real world
    Join Date
    Feb 2004
    Posts
    50
    Originally posted by Hammer
    >>Then how does it know how long the data is? You cannot assume a \0 terminated array in this scenario.
    I forgot to mention that I also pass in the size of the char* - in bytesand the struct to place the result in. The algorithm being used is the SHA-1 algorithm - standard hash algoritm.

    >> Has anyone said you can't call this function more than once?
    Under the guidelines of the SHA-1 algorithm I can only call it once.
    Calling it multiple times will result in a different result (the result being a 160bit (40byte) string.

    >>what happens when I have to read a 100MB file and represent that file by a char*
    "You're RAM starts to fill up, and the OS swaps out memory to disk, making the whole thing inefficient."
    Inefficient but still plausible - which is what I really need.

    Thanks Hammer.

  7. #7
    Registered User
    Join Date
    Jul 2002
    Posts
    13
    OOPboredom: to read a large file, use memory mapping. If Linux is the case, use mmap(). I don't know the memory mapping function for Windows.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Reading large complicated data files
    By dodzy in forum C Programming
    Replies: 16
    Last Post: 05-17-2006, 04:57 PM
  2. Reading files in a directory
    By roktsyntst in forum Windows Programming
    Replies: 5
    Last Post: 02-07-2003, 10:04 AM
  3. problem reading files in C
    By angelfly in forum C Programming
    Replies: 9
    Last Post: 10-10-2001, 11:58 AM
  4. Need Advice in reading files
    By jon in forum C Programming
    Replies: 4
    Last Post: 10-07-2001, 07:27 AM
  5. Reading Large Files!!!
    By jon in forum Windows Programming
    Replies: 1
    Last Post: 09-09-2001, 11:20 PM