Thread: Reading a File - FASTEST WAY POSSIBLE??

  1. #1
    Codebot
    Join Date
    Jun 2004
    Location
    Toronto
    Posts
    195

    Reading a File - FASTEST WAY POSSIBLE??

    Hi, Im still a bit new to programming in C++ and what imn trying to achieve is to read a file into memory as fast as possible. Here is what I do:

    Code:
    ...
    
    
    	char * sMessage;
    	register long lSize;
    	FILE * fp;
    
    	if (argc)
    	{
    		fp = fopen(argv[1], "rb");
    
    		// THIS FINDS THE SIZE OF THE FILE
    		fseek(fp , 0 , SEEK_END);
    		lSize = ftell (fp);
    		rewind (fp);
    
    		printf("SIZE: %d\n", lSize);
    
    		// ALLOCATES NEW MEMORY FOR THE ARRAY 
    		sMessage = new char[lSize];
    
                    // THIS IS THE PART I WANT TO SPEED UP
                    // I READ THE CONTENTS FROM A BINARY FILE
                    // ONE BYTE AT A TIME
    		for (i = 0; i < lSize; i++)
    		{
    			sMessage[i] = getc(fp);
    		}
    
    
    		printf("READ: %d\n", lSize);
    
    
    }
    
    return 0;
    Im trying to speed up the reading process so it can read ~200Mb files in onle a couple of seconds. Does anyone have any suggestions????

  2. #2
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Since you know the size of the file you should be able to do this:
    Code:
    fread(sMessage, lSize, 1, fp);
    I don't know of a good c++ method sorry. You should also check the return value of fread for errors.

  3. #3
    Codebot
    Join Date
    Jun 2004
    Location
    Toronto
    Posts
    195
    I tried using fread and when it comes to files that are 200Mb big, its a bit sluggish, therefore I just use a loop and read it one byte at a time.

    Sine the buffer on most machines is quite small, around 512Kb, a 200Mb file is not going to fit in there, so the machine has to read from the file many times before my program can play around with it.

  4. #4
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Well if you are on a *nix machine you might try read() (can't remember if its avilable on windows or not). And if you can only read 512k at a time just make several calls to fread(). Making 4 or 5 calls to fread() will be faster then making 2 million calls to fgetc(). Also if you need to load an entire 200mb file into memory you really need to rethink your solution.

  5. #5
    Codebot
    Join Date
    Jun 2004
    Location
    Toronto
    Posts
    195
    Well what do you guys suggest, all my program does right now is create a CRC32 for large files. If you have any other methods of approaching this then by all means...

    The actual algorithm takes little time to process on my machine, the file reading is alot more time consuming.

  6. #6
    Me -=SoKrA=-'s Avatar
    Join Date
    Oct 2002
    Location
    Europe
    Posts
    448
    Try this and see what you get:
    Code:
    //assumming we're using namespace std
    ifsteam file("file.txt", ios::in);
    
    //go to the end of the file
    file.seekg(0, ios::end);
    //and get the cursor position
    //which gives us the size
    int length = file.tellg():
    
    //we use buf as our buffer. We add
    //1 to the length for the null-terminator
    char* buf = new char[length + 1];
    //go to the beginning of the file
    file.tellg(0, ios::beg);
    
    //now read everything in one go
    file.read(buf, length);
    buf[length +1] = '\0'; //don't forget this
                          //or we might run into some memory
                         //which is a definite no-no
    
    //and close the file
    file.close();
    I recently used it in a project, although it compiles I'm not sure how efficient it is, because I've not been able to test it.
    SoKrA-BTS "Judge not the program I made, but the one I've yet to code"
    I say what I say, I mean what I mean.
    IDE: emacs + make + gcc and proud of it.

  7. #7
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > Im trying to speed up the reading process so it can read ~200Mb files in onle a couple of seconds.
    Unless you've got the latest kit, I doubt you'll get anywhere near that.

    > Sine the buffer on most machines is quite small, around 512Kb
    Says who?
    And which buffer?
    You can't know in general how many buffers there are, let alone how big they are.

    What are you intending to do with the data once you're read it?
    If its something simple like
    Code:
        for ( i = 0 ; i < len ; i++ ) {
            sum += buff[i];
        }
    Then its a waste of time and resource to allocate memory for the whole file.

    A couple of ideas for you
    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    #include <time.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <sys/mman.h>
    
    #define BSIZE   BUFSIZ*100
    void calcs ( unsigned char *buff, size_t len ) {
        int sum = 0;
        size_t  i;
        for ( i = 0 ; i < len ; i++ ) {
            sum += buff[i];
        }
        printf( "Sum=%d\n", sum );
    }
    
    void exp1 ( char *filename ) {
        FILE *fp;
        fp = fopen( filename, "rb" );
        if ( fp ) {
            long len;
            unsigned char *mem;
            fseek( fp, 0, SEEK_END );
            len = ftell(fp);
            fseek( fp, 0, SEEK_SET );
            printf( "Len=%ld\n", len );
            mem = malloc( len );
            if ( mem ) {
                int n;
                unsigned char *p = mem;
                while ( (n=fread(p,1,BSIZE,fp)) > 0 ) {
                    p += n;
                }
                calcs( mem, len );
                free(mem);
            }
            fclose(fp);
        }
    }
    
    void exp2 ( char *filename ) {
        int fd;
        fd = open( filename, O_RDONLY );
        if ( fd != -1 ) {
            off_t len;
            unsigned char *mem;
            len = lseek( fd, 0, SEEK_END );
            mem = mmap( 0, len, PROT_READ, MAP_SHARED, fd, 0 );     /* map the whole file */
            if ( mem != NULL ) {
                calcs( mem, len );
                munmap( mem, len );
            } else {
                perror("oops");
            }
        }
    }
    
    int main ( int argc, char *argv[] ) {
        printf( "%d\n", clock() );
        exp1( argv[1] );
        printf( "%d\n", clock() );
        exp2( argv[1] );
        printf( "%d\n", clock() );
        return 0;
    }
    For me at least, the mmap() solution is twice as quick as fread()ing large blocks.

    EDIT
    So you are just touching each byte once. It's a waste of time reading the whole file into a huge buffer. Use a small buffer about BUFSIZ in size and keep freading into that and performing next block of your CRC
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  8. #8
    Codebot
    Join Date
    Jun 2004
    Location
    Toronto
    Posts
    195
    I will try your Idea, the calculation im doing the the standard CRC32 algorithm.

    Code:
    for (i = 0; i < lSize; i++){
    	iCRC = ((iCRC >> 8) & 0xFFFFFFFF) ^ table[(iCRC ^ sMessage[i]) & 0xFF];
    }
    This part is extreemely fast on my machine. Ill try the mapping feature instead.

  9. #9
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708

    Lightbulb

    >
    > Reading a File - FASTEST WAY POSSIBLE??
    >

    good old C (memory requirement: one byte) -

    Code:
    unsigned int copyfile(const char * to, const char * from)
    {
     int byte;
     unsigned int size = 0;
     FILE * in = fopen(from, "rb");
     if(!in) {
      return 0;
      }
     FILE * out = fopen(to, "wb");
     if(out) {
      chsize(fileno(out), 0);
      while(EOF != (byte = fgetc(in))) {
        fputc(byte, out);
        ++size;  
       }
      fclose(out);
      }  
     fclose(in);  
     return size;
    }
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  10. #10
    Registered User
    Join Date
    Oct 2001
    Posts
    2,934
    I think Mastadex said: FASTEST WAY POSSIBLE??

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > This part is extreemely fast on my machine. Ill try the mapping feature instead.
    As it should be - but irrelevant to the overall time it takes to read a file.

    Randomly picked hard disk spec
    Would take over a second to read your file in even the most perfect environment crafted to show off your work in the best possible light. A hard disk can never keep a processor even remotely occupied for any sustained length of time.

    Throw an unoptimised, off the self, multitasking operating system and file system on top of it (can you say fragmented files) and you start to see the problems.

    For example, a 10ms seek time equates to about for ( i = 0 ; i < 20E6 ; i++ ) for your 2GHZ processor. That's like 10% of the work you need to perform and you haven't even opened the file yet.

    Here's another 10% of your work gone - the OS decides to schedule some other activity for 10ms, whilst it waits for the disk to catch up.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Codebot
    Join Date
    Jun 2004
    Location
    Toronto
    Posts
    195
    Unfortunately what im coing is only windows specific, so it wont be ported to Lunix/Unix. Is there a windows equivelent of the mmap function??

  13. #13
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Yes
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  14. #14
    Codebot
    Join Date
    Jun 2004
    Location
    Toronto
    Posts
    195
    And what would it be??

  15. #15
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    All answers will be show to you once you CLICK THE LINK PROVIDED!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. sequential file program
    By needhelpbad in forum C Programming
    Replies: 80
    Last Post: 06-08-2008, 01:04 PM
  2. Formatting the contents of a text file
    By dagorsul in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2008, 12:36 PM
  3. Can we have vector of vector?
    By ketu1 in forum C++ Programming
    Replies: 24
    Last Post: 01-03-2008, 05:02 AM
  4. Game Pointer Trouble?
    By Drahcir in forum C Programming
    Replies: 8
    Last Post: 02-04-2006, 02:53 AM
  5. System
    By drdroid in forum C++ Programming
    Replies: 3
    Last Post: 06-28-2002, 10:12 PM