Thread: text file -> memory usage

  1. #1
    Registered User
    Join Date
    Jul 2011
    Location
    Croatia
    Posts
    16

    text file -> memory usage

    Hi I am coding on win7 in C.
    If I'm right win7 is not counting EOF(end of file) in total memory usage of text file(.txt)?
    E.g., if I have simple text file with 3 characters:
    a
    b
    c
    this file is stored on disk as:
    a\r\nb\r\ncEOF

    that's total 8 bytes, when i right click that file and hit properties I get 7 bytes, why?

  2. #2
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    There is no actual EOF character in your file*. EOF is just a value that read functions return to tell you when the input is over. EDIT: This is why functions like getchar actually return an int, to hold all possible char values and why you must declare your variables that store the result of getchar as ints. Read this: http://c-faq.com/stdio/getcharc.html.

    Some file systems may implement an "EOF character" to mark the end of the file, some may not. Some may simply keep track of the length and whether you've read all the bytes or not. C does not care about what kind of file system you're using, thus it won't care if there's an actual EOF character or not. The OS will tell whatever read functions you're using that it reached the end of the file. Even if a file system did use an EOF character, it would probably not count it in the file size, since it is metadata, not "actual" file data. You only put 7 bytes in the file, so that's all there is. If the OS/file system needs extra bytes to track other info about your file, it's not part of the file size.
    Last edited by anduril462; 03-16-2012 at 04:10 PM.

  3. #3
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    C does not care about what kind of file system you're using, thus it won't care if there's an actual EOF character or not.
    It's easy to prove that this is not the case. Create a text file, and somewhere in the middle of it insert a ^Z character (byte with value 0x1A). Then write a little C program that opens the file in text mode and reads from the file. At least on Microsoft's runtime, as soon as it hits that ^Z character it will treat it as the end of the file. The only way to read past it is to open the file in binary mode.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  4. #4
    Registered User
    Join Date
    Jul 2011
    Location
    Croatia
    Posts
    16
    Thanks I see... Can fread() go out of the bound of the file and potentially SegFault, or it stops when EOF is reached?

  5. #5
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    fread wont read past the end of file. If you try to read more data than there is left in the file, it will return a short item count, i.e. it will return how many things it actually read, instead of how many you asked for. For example, if you try to read 10 bytes, and fread returns 2, you need to handle that. Note that fread doesn't distinguish between an error and end of file, so use feof() and ferror() to check what happened. Read the documentation for details: fread man page.

    A call to fread can cause a seg fault however, if you aren't careful about where you store the data you read. If you store it in some buffer/array, and try to write past then end of that, or you're simply trying to store the data in an invalid pointer, you may cause a seg fault.

  6. #6
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by brewbuck View Post
    It's easy to prove that this is not the case. Create a text file, and somewhere in the middle of it insert a ^Z character (byte with value 0x1A). Then write a little C program that opens the file in text mode and reads from the file. At least on Microsoft's runtime, as soon as it hits that ^Z character it will treat it as the end of the file. The only way to read past it is to open the file in binary mode.
    This is entirely OS/compiler specific (not the EOF character itself, but the text vs binary mode). BSD doesn't have a difference between text and binary modes. (fopen)


    Quzah.
    Last edited by quzah; 03-16-2012 at 08:26 PM.
    Hope is the first step on the road to disappointment.

  7. #7
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Quote Originally Posted by quzah View Post
    ... BSD doesn't have a difference between text and binary modes.
    That is true of every Unix not just BSD. All *nixes treat a file as an unformatted stream of bytes.

  8. #8
    Registered User
    Join Date
    Jul 2011
    Location
    Croatia
    Posts
    16
    Quote Originally Posted by anduril462 View Post
    fread wont read past the end of file. If you try to read more data than there is left in the file, it will return a short item count, i.e. it will return how many things it actually read, instead of how many you asked for. For example, if you try to read 10 bytes, and fread returns 2, you need to handle that. Note that fread doesn't distinguish between an error and end of file, so use feof() and ferror() to check what happened. Read the documentation for details: fread man page.

    A call to fread can cause a seg fault however, if you aren't careful about where you store the data you read. If you store it in some buffer/array, and try to write past then end of that, or you're simply trying to store the data in an invalid pointer, you may cause a seg fault.
    Thanks, problem solved.

  9. #9
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by quzah View Post
    This is entirely OS/compiler specific (not the EOF character itself, but the text vs binary mode). BSD doesn't have a difference between text and binary modes. (fopen)
    The fact that the modes even exist is evidence that you need to be prepared for them to be different. Thus portable code has to assume they are different, not that they are the same. Which means weird things can happen, such as an apparent EOF in the middle of a file, when opened in the wrong mode on Windows.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  10. #10
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Quote Originally Posted by brewbuck View Post
    It's easy to prove that this is not the case. Create a text file, and somewhere in the middle of it insert a ^Z character (byte with value 0x1A). Then write a little C program that opens the file in text mode and reads from the file. At least on Microsoft's runtime, as soon as it hits that ^Z character it will treat it as the end of the file. The only way to read past it is to open the file in binary mode.
    That doesn't actually prove anything, other than the fact that Windows runtimes still have vestiges from CP/M, which MS-DOS was based off of*. My statement you were trying to refute, that C doesn't care whether there's an actual EOF character, was and is completely true, and is not disproved by your example in any way. C is ignorant of the underlying file system (part of the reason there's no standard C function to get file size, iterate through directory structures, etc). It simply asks the OS for data from the stream. The circumstances under which the OS reports end-of-file to the C library functions has absolutely nothing to do with C nor file mode, and everything to do with the file system. Furthermore, the OP's example made it pretty clear that the only thing he put in the file were some letters and newlines, no 0x1a character. You've been on this board long enough to know that many newbies have the misconception that there is some magical, universal EOF character stored at the end of every file on every computer system everywhere, and is what is returned by functions like getchar. More evidence of this misconception is the fact that the OP was asking why the "EOF character" was not reported in the file size when he examined the properties, i.e. he expected the mythical EOF character that wasn't there. The 0x1a byte, if present in a file, is counted as part of the file size, and doesn't cause the file system or OS to stop counting early (at least on NTFS -- I no longer have any FAT filesystems to check this on).


    * Without doing way more research than I care to on this topic, that's the best I could do to trace the origin of this behavior.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Memory / CPU Usage
    By javaeyes in forum C Programming
    Replies: 3
    Last Post: 02-27-2012, 06:19 PM
  2. Memory Usage
    By MK27 in forum Linux Programming
    Replies: 2
    Last Post: 07-16-2009, 05:52 PM
  3. Memory usage and memory leaks
    By vsanandan in forum C Programming
    Replies: 1
    Last Post: 05-03-2008, 05:45 AM
  4. Memory Usage
    By ghe1 in forum Linux Programming
    Replies: 0
    Last Post: 03-18-2002, 09:43 AM
  5. Memory usage
    By Razor Ice in forum C++ Programming
    Replies: 3
    Last Post: 01-08-2002, 03:16 PM

Tags for this Thread