Thread: a function that returns the number of bytes of a file

  1. #16
    Registered User OnionKnight's Avatar
    Join Date
    Jan 2005
    Posts
    555
    Counting characters isn't very effective though. You people need to act less elitist, for instance, tell the threadstarter that there are also better ways but that you need more info to help him.

  2. #17
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Well the OP only wanted an 'isEmpty' test.

    So you attempt to read one character - no more, no less.
    If you read EOF, then the file is empty, otherwise it isn't.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #18
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >Counting characters isn't very effective though.
    It's very effective. I think you meant efficient. Of course, then I would ask you for a profile that shows the solution to be a significant bottleneck in the application that warrants something more efficient.

    >You people need to act less elitist
    I'd love to hear why you think my original question was elitist. I will accept that correcting the blatantly incorrect suggestions of others could be viewed as elitist. Should I just stay silent and let everyone be wrong?

    >there are also better ways but that you need more info to help him
    There may not be better ways. It depends completely on what he's trying to accomplish, the tools available, and any restrictions, all of which were not supplied as a part of the original question.

    >Well the OP only wanted an 'isEmpty' test.
    The thread title suggests something more general.
    My best code is written with the delete key.

  4. #19
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    Here is a little program I just used to test how inefficient as far as time reading all the characters and counting is. I went to gamefaqs.com and to one of their FAQs pages, I took the longest FAQs I could find there and tested it on them.
    Code:
    #include <iostream>
    #include <fstream>
    #include <ctime>
    
    int main() {
       std::ifstream inFile("FILENAME.TXT"); // Generic filename as they varied with the results
       int byteCount = 0;
       char junk;
       double start, end;
       
       start = (double)clock();
       while(inFile.get(junk))
          byteCount++;
       end = (double)clock();;
       
       std::cout << "The file is " << byteCount << " bytes.\n"
                 << "This operation took " << (end - start) / CLK_TCK << " seconds."; 
       // CLK_TCK is a macro representing how many of the ticks that clock() returns are in a second
       
       return 0;
    }
    Here are some results.
    Code:
    The file is 728238 bytes.
    This operation took 0.046 seconds.
    
    The file is 460328 bytes.
    This operation took 0.031 seconds.
    
    The file is 1158421 bytes.
    This operation took 0.078 seconds.
    Now that last one was over a megabyte. You could multiply that by 100 and still be under a second. Hell you could read a gigabyte in under 10 seconds. How much more efficient could you want it? It's clean, it's simple, it's effective.
    Sent from my iPadŽ

  5. #20
    Registered User OnionKnight's Avatar
    Join Date
    Jan 2005
    Posts
    555
    It's very effective. I think you meant efficient. Of course, then I would ask you for a profile that shows the solution to be a significant bottleneck in the application that warrants something more efficient.
    Yeah I meant efficient. How about a detailed file listing of a directory where the total size of files is quite huge? Like a directory with ISOs.

    I'd love to hear why you think my original question was elitist. I will accept that correcting the blatantly incorrect suggestions of others could be viewed as elitist. Should I just stay silent and let everyone be wrong?
    Nothing wrong with your question, and nothing wrong with correcting others. I'm thinking of when the higher-ups notice the slightest change of atmosphere or wrongdoing in the thread and has to turn it into a gigantic ........fest flamewar.

  6. #21
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    You're not taking into consideration who we were talking to. Go read some of Fischer's previous threads. The most amusing ones are the early posts, their attempts at being "1337". Now we just treat them as the incorrect ass hat they are.


    Quzah.
    Last edited by quzah; 05-12-2006 at 03:49 PM. Reason: I can't worth a damn.
    Hope is the first step on the road to disappointment.

  7. #22
    Registered User fischerandom's Avatar
    Join Date
    Aug 2005
    Location
    Stockholm
    Posts
    71
    The physical file size may be different under different file systems, OS, media type, and even from time to time under the very same of the above mentioned because the OS usually align files to a page size, which is typically 2048, 4096 or 8192 bytes, etc. So the physical file size it occupies on the media, by the system, is not that easy to estimate. For example if we create a file on a harddrive with one byte having the value zero the physical file size as seen by the OS is probably a page. Let's say the page size is 4096 bytes and we create 4096 such one-byte files with the value zero stored, the result may occupy 16777216 bytes on the harddrive, allocated by the OS of course. So thats one thing to have in mind.
    The logical file size (1 byte) may physically occupy a page-size, etc, depending on the media, which can vary GREATLY.
    So we can not easily answer what the physical size of a file is.
    Can we get an accurate answer to the logical file size both in the case the file is stored in binary format or in text format, on a particular system, without actually counting them one-by-one, if understanding that the answer is true on the system the file resides on but may be false on another system? Yes but only if the number of characters used for '\n' (new-line) is consistent over the whole file, if they appear. Then the answer depends on how you want to read the file: If you want to read it as text, open the file with "r" and apply the fseek, if you want to read it as binary data, open the file with "rb" and apply the fseek. For example if you plan to allocate a buffer in RAM large enough to hold all data in the file you will get it right this way. You, as an implementer and designer must know HOW you want to read the data. Compare the result with counting every byte. If the file is Giga size, you really benefit!

    PS.... Don't ever try to time the efficiency of your code by using the standard library's clock( ), since you can never know how busy the system and other processes are when you time it. Use a tool to analyse the efficiency of your code instead. Metrowerks CodeWarrior have a profiler tool for that, etc.
    Last edited by fischerandom; 05-15-2006 at 04:56 PM.
    Bobby Fischer Live Radio Interviews http://home.att.ne.jp/moon/fischer/

  8. #23
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    Quote Originally Posted by fischerandom
    The physical file size may be different under different file systems, OS, media type, and even from time to time under the very same of the above mentioned because the OS usually align files to a page size, which is typically 2048, 4096 or 8192 bytes, etc. So the physical file size it occupies on the media, by the system, is not that easy to estimate. For example if we create a file on a harddrive with one byte having the value zero the physical file size as seen by the OS is probably a page. Let's say the page size is 4096 bytes and we create 4096 such one-byte files with the value zero stored, the result may occupy 16777216 bytes on the harddrive, allocated by the OS of course. So thats one thing to have in mind.
    The logical file size (1 byte) may physically occupy a page-size, etc, depending on the media, which can vary GREATLY.
    So we can not easily answer what the physical size of a file is.
    Can we get an accurate answer to the logical file size both in the case the file is stored in binary format or in text format, on a particular system, without actually counting them one-by-one, if understanding that the answer is true on the system the file resides on but may be false on another system? Yes but only if the number of characters used for '\n' (new-line) is consistent over the whole file, if they appear. Then the answer depends on how you want to read the file: If you want to read it as text, open the file with "r" and apply the fseek, if you want to read it as binary data, open the file with "rb" and apply the fseek. For example if you plan to allocate a buffer in RAM large enough to hold all data in the file you will get it right this way. You, as an implementer and designer must know HOW you want to read the data. Compare the result with counting every byte. If the file is Giga size, you really benefit!

    PS.... Don't ever try to time the efficiency of your code by using the standard library's clock( ), since you can never know how busy the system and other processes are when you time it. Use a tool to analyse the efficiency of your code instead. Metrowerks CodeWarrior have a profiler tool for that, etc.
    Do you try to BS your way through things in real life too?
    My best code is written with the delete key.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. In over my head
    By Shelnutt2 in forum C Programming
    Replies: 1
    Last Post: 07-08-2008, 06:54 PM
  2. To find the memory leaks without using any tools
    By asadullah in forum C Programming
    Replies: 2
    Last Post: 05-12-2008, 07:54 AM
  3. Encryption program
    By zeiffelz in forum C Programming
    Replies: 1
    Last Post: 06-15-2005, 03:39 AM
  4. System
    By drdroid in forum C++ Programming
    Replies: 3
    Last Post: 06-28-2002, 10:12 PM
  5. Contest Results - May 27, 2002
    By ygfperson in forum A Brief History of Cprogramming.com
    Replies: 18
    Last Post: 06-18-2002, 01:27 PM