Thread: Safely Reading A File Of Arbitrary Length Into Memory.

  1. #1
    Registered User
    Join Date
    May 2016
    Posts
    104

    Safely Reading A File Of Arbitrary Length Into Memory.

    So far I've been using a simple function I devised that makes use of Open, Read and Close to read a file into a buffer; no stdlib functions are allowed for me, only these sys calls. No problems here.

    The project I'm working on however -an implementation of MD5- requires me to read files that may very well exceed the capacity of system RAM.

    I thought of working with chunks of 512-1024MB at a time, but splitting the work like this would require me to rewrite my algorithm, at least in part.

    Besides if I do, I would have to do some work at the end of the file before I process it anyway -pad it and append some bytes to it- so I would still need to read the whole thing at least once to get the file size with Read, then read again to store the last chunk, pad it and do the other necessary work on the last chunk, then read again from the beginning x-chunks at a time and process the message to get the digest and finally process the last chunk which I stored at the start.

    I was wondering if you guys have real life recommendations. Is splitting the workload really the best approach?

    Thanks.

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Dren
    The project I'm working on however -an implementation of MD5- requires me to read files that may very well exceed the capacity of system RAM.

    I thought of working with chunks of 512-1024MB at a time, but splitting the work like this would require me to rewrite my algorithm, at least in part.
    If the file size could exceed memory, then it sounds like you must split it up, and the design of MD5 suggests that this was intended.

    Quote Originally Posted by Dren
    Besides if I do, I would have to do some work at the end of the file before I process it anyway -pad it and append some bytes to it- so I would still need to read the whole thing at least once to get the file size with Read, then read again to store the last chunk, pad it and do the other necessary work on the last chunk, then read again from the beginning x-chunks at a time and process the message to get the digest and finally process the last chunk which I stored at the start.
    No, you don't need to read the entire file twice: you can process the earlier chunks while counting the file length. When you reach the final chunk, that's when you do the padding based on the computed original file length, and only then do you process the last chunk.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    May 2016
    Posts
    104
    Quote Originally Posted by laserlight View Post
    No, you don't need to read the entire file twice: you can process the earlier chunks while counting the file length. When you reach the final chunk, that's when you do the padding based on the computed original file length, and only then do you process the last chunk.
    Yes, of course!

    The way it works now, I have a pointer to a copy of the file primed and ready in the heap, which I feed into the digest algorithm 64 bytes at a time -MD5's specification demands this number. But I could just as well read the thing directly and pass it to my algorithm until I read the last block, then I'll know the file size and I'll be able to do the padding, Brilliant. I don't even need to dynamically allocate memory doing it this way.

    Thanks man!
    Kudos

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Reading in file of unknown length
    By zone159 in forum C Programming
    Replies: 2
    Last Post: 11-14-2012, 02:07 PM
  2. Reading length of string from record in file
    By TexasKid in forum C Programming
    Replies: 4
    Last Post: 04-24-2012, 09:39 AM
  3. Arbitrary length multi-dimensional arrays
    By thw in forum C Programming
    Replies: 4
    Last Post: 11-22-2006, 02:25 PM
  4. reading a file of unknown length
    By the bassinvader in forum C Programming
    Replies: 2
    Last Post: 07-12-2006, 03:06 PM
  5. Making a script with arbitrary array length
    By Bri Rock in forum Linux Programming
    Replies: 3
    Last Post: 07-15-2004, 08:59 AM

Tags for this Thread