Thread: Doing a malloc from an ftell

  1. #1
    Old Fashioned
    Join Date
    Nov 2016
    Posts
    137

    Question Doing a malloc from an ftell

    I've reached this situation several times and just want to get some general advice.

    I want to perform a call to malloc which allocates the exact number of bytes of a file on disk. One issue is that ftell() returns a long but malloc takes a size_t. I feel like the naive but obvious approach is to just do something like this:

    Code:
    long filesize = ftell(fp);
    
    char *database = malloc((size_t)filesize);
    The issue here is if ftell returns -1, this is not a good situation for malloc. Another issue is that the user can control directly how much is malloc'd by feeding various size files into the parser, potentially mallocing a HUGE amount of memory. Of course for this second problem, I could decide on a max # of bytes and create some logic to prevent a huge allocation. But I'm not sure on how to gather info to make this decision.

    What are some better methods of accomplishing memory allocation based off of disk file size? I just need to map a database from disk to memory. I work as a security engineer professionally and I've seen some of the other engineers do a ftell output directly to a malloc but it created some security issues a few times, however, I've not yet discovered a better way of doing this aside from the aforementioned.

    Last but not least, can someone explain some background on some of these C functions with different return types? For example, I would expect ftell to return a size_t but it returns a long. I've noticed this is the case with several C library functions and it makes programming in C tricky at times until of course one has learned how to deal with each of these specific conversion situations. I assume that the long may be due to the large number of bytes that could be output from ftell, but then again, why not unsigned long or long long?

    Thank you.
    If I was homeless and jobless, I would take my laptop to a wifi source and write C for fun all day. It's the same thing I enjoy now!

  2. #2
    Registered User
    Join Date
    May 2010
    Posts
    4,633
    First how are you opening the file binary or text?

    Second what operating system are you using?

    Third how are you calling fseek()?

    Fourth do you realize that a file may be much larger than all of your available memory?

    The issue here is if ftell returns -1, this is not a good situation for malloc.
    That's why you should always check the return value from this function. If it indicates an error, EOF, in this case you probably shouldn't use the value. And be careful, EOF can be any negative value.

    Another issue is that the user can control directly how much is malloc'd by feeding various size files into the parser, potentially mallocing a HUGE amount of memory.
    Yes, see above. But realize that since ftell() returns an long, the largest amount of memory you will be able to allocate with this method is the largest value a long can hold. But beware of any negative values.

    Of course for this second problem, I could decide on a max # of bytes and create some logic to prevent a huge allocation. But I'm not sure on how to gather info to make this decision.
    This is probably the best course, just insure your "max" is sized properly so as not to exceed the maximums of size_t and long.

    What are some better methods of accomplishing memory allocation based off of disk file size?
    Normally trying to read a complete file into memory is a big mistake when dealing with large file sizes, stick with a fixed "buffer" and when you complete the processing of the buffer read in the next chunk.

    I just need to map a database from disk to memory.
    This is normally accomplished by "indexes" into the file.

    Last but not least, can someone explain some background on some of these C functions with different return types?
    Possibly, let's start by addressing some of your other issues in that paragraph?

    For example, I would expect ftell to return a size_t but it returns a long.
    Why would you expect a size_t? Do you realize that a size_t is an implementation defined unsigned type? Do you realize that it is possible to seek backwards (negative)?

    I've noticed this is the case with several C library functions and it makes programming in C tricky at times until of course one has learned how to deal with each of these specific conversion situations.
    Normally if you need to do "conversions" you're doing something wrong.

    I assume that the long may be due to the large number of bytes that could be output from ftell, but then again, why not unsigned long or long long?
    You're assuming something that is probably wrong. First since you can access the file randomly you need negative values which are not supported by any unsigned type. Second it is possible that a long not be able to hold all positive values returned in a size_t type. For example on my system a long can hold positive values in the range of 0 to 9223372036854775807 while a size_t can hold values in the range of 0 to 18446744073709551615 so if you try to "convert" a size_t to a long you could invoke UB.
    Last edited by jimblumberg; 01-21-2019 at 03:23 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 7
    Last Post: 05-19-2010, 02:12 AM
  2. malloc? What is this?
    By dlwlsdn in forum C Programming
    Replies: 12
    Last Post: 11-08-2008, 11:46 PM
  3. Replies: 7
    Last Post: 10-01-2008, 07:45 PM
  4. New vs Malloc again!
    By Bajanine in forum Windows Programming
    Replies: 3
    Last Post: 05-13-2003, 09:56 PM
  5. malloc
    By siubo in forum C Programming
    Replies: 10
    Last Post: 05-08-2003, 02:38 AM

Tags for this Thread