Searching a VERY large text file

This is a discussion on Searching a VERY large text file within the C Programming forums, part of the General Programming Boards category; I'm writing a utility to search a very large text file. I only need to search the first few letters ...

  1. #1
    Goscinny or Uderzo?
    Join Date
    Jun 2004
    Posts
    33

    Smile Searching a VERY large text file

    I'm writing a utility to search a very large text file. I only need to search the first few letters of each record to know if it is the one I need.

    The file I have to search is a text file by necessity due to other programmes that use it too.

    The problem is that the file is so large that I can't read it into a buffer without using up all of the virtual memory in windows. I can't just increase the size of my pagefile because the application must be portable to other workstations in the company without the need for environment changes.

    I'm attempting to read in the file in chunks as is neccessary but any way I try this doesn't seem to work. This is what I have so far (excluding any extraneous information of course)
    Code:
    #define MAX_LGT_EXT 50
    #define MAX_NO_EXT 1000
    ......
       int i, j, success=1;
       char **buffer, Image[MAX_LGT_EXT], search[15], serial[5], refresh[2];
       FILE *fopen(), *file_list;
    ......
       if( ( file_list = fopen("file_images.dat", "r") ) == NULL ){
          puts("\nError opening datafile: please ensure file_images.dat is in this directory\n");
          system("pwd");
          return 1;
       }
    
    
    buffer = malloc( MAX_NO_EXT * sizeof(char *) );
       if( buffer == NULL ){
          puts("\nBuffer error: out of memory\n");
          return 1;
       }
       for(i=0; i<MAX_NO_EXT; i++){
          buffer[i] = malloc( MAX_LGT_EXT * sizeof(char) );
             if( buffer[i] == NULL ){
                for(j = 0; j < i; ++j) {
                   free(buffer[j]);
                }
                puts("\nBuffer error: out of memory\n");
                return 1;
             }
       }  
    
    
    
       i=0;
       while( i<MAX_NO_EXT && (fgets(buffer[i], MAX_LGT_EXT, file_list) ) != NULL ) {
          i++;
       }
    
       puts("Please wait- searching");
       for(j=0; j<i && success!=0; j++){
          if( strncmp(buffer[j], "Bill", 4) == 0){
             strcpy(Image, buffer[j]);
          }
          if( strncmp(buffer[j], search, 13) == 0){
          	 printf("File: %s", buffer[j]);
          	 printf("%s", Image);
             success=0;
          }
          if( j==i-1 ){
             i=0;
             j=0;
             while( i<MAX_NO_EXT && (fgets(buffer[i], MAX_LGT_EXT, file_list) ) != NULL ) {
                i++;
             }
          }
       }
    ......
    The program compiles fine, but the problem arises when I attempt to run this section of the code: I get three errors-
    The first two say-The instruction at "a hex number" referenced memory at "a hex number". The memory could not be 'read'.
    The third says- The exception unknown software exception ("a hex number") occurred in the application at location "a hex number"

    If you need any more information or code just let me know.

    Any assistance anyone could offer would be much appreciated!
    Last edited by Tankndozer; 07-29-2004 at 02:24 AM. Reason: Thanks for your help Sebastiani ;)
    Quantum materiae materietur marmota monax si marmota monax materiam possit materiari?

    (How much wood would a wood chuck cut if a wood chuck could chuck wood?)

  2. #2
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    well, this could be your problem:

    Code:
    buffer = malloc( MAX_NO_EXT * sizeof(char *) );
       for(i=0; i<MAX_NO_EXT; i++){
          buffer[i] = malloc( MAX_LGT_EXT * sizeof(char) );
       }  
       if( buffer == NULL ){
          puts("\nBuffer error: out of memory\n");
          return 1;
       }
    you perform your validation check on 'buffer' too late, and forgot to validate the allocation of the pointers within it entirely.

    Code:
    buffer = malloc( MAX_NO_EXT * sizeof(char *) );
       if( buffer == NULL ){
          puts("\nBuffer error: out of memory\n");
          return 1;
       }
       for(i=0; i<MAX_NO_EXT; i++){
          buffer[i] = malloc( MAX_LGT_EXT * sizeof(char) );
             if( buffer[i] == NULL ){
                for(j = 0; j < i; ++j) {
                   free(buffer[j]);
                }
                puts("\nBuffer error: out of memory\n");
                return 1;
             }
       }
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  3. #3
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,555
    1. Why aren't you using standard text searching programs like grep, awk or perl ?

    2. Why don't you start with something simple like
    Code:
    while ( fgets( buff, BUFSIZ, fp ) != NULL ) {
      if ( strncmp( buff, search, 13 ) == 0 ) {
        fputs( buff, stdout );
      }
    }
    All that memory allocation stuff is just complicating the issue.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  4. #4
    Goscinny or Uderzo?
    Join Date
    Jun 2004
    Posts
    33
    Quote Originally Posted by Sebastiani
    you perform your validation check on 'buffer' too late, and forgot to validate the allocation of the pointers within it entirely.
    Thanks for pointing that one out. Just for the sake of good practice I should be performing that check.

    Unfortunately this isn't the problem in this case. The memory allocation works fine (I know this as I can read the first 1000 lines into the buffer without any problems).

    The error seems to be arising when I try to read another 1000 lines into the same buffer.

    1. Why aren't you using standard text searching programs like grep, awk or perl ?

    2. Why don't you start with something simple like
    Code:

    while ( fgets( buff, BUFSIZ, fp ) != NULL ) {
    if ( strncmp( buff, search, 13 ) == 0 ) {
    fputs( buff, stdout );
    }
    }

    All that memory allocation stuff is just complicating the issue.
    Thanks for your suggestions Salem. Unfortunately I can't use grep becuase of the context searching requirement and the fact that this is only a small part of a very much larger program. There are other functions to be performed on several of the lines ( e.g. other required information must be retrieved.) so the buffer is required and calls to other functions will eventually be made to deal with each group of lines.

    I can't use awk or Perl as that would make sense but as a lowly application developer, unfortunately that's not my call to make! I'm just told what to develop and what to use. It has been made clear to me that I'm only to use C.

    With regard to the memory allocation stuff- I'm only defining the value for the moment for ease of development. Eventually the buffer size has to be a user defined value.

    So basically, I'm stuck with trying to figure out why a second group of 1000 lines cannot be read into the buffer in the code posted originally.

    Thanks for posting your ideas guys. It's very much appreciated!!
    Quantum materiae materietur marmota monax si marmota monax materiam possit materiari?

    (How much wood would a wood chuck cut if a wood chuck could chuck wood?)

  5. #5
    Goscinny or Uderzo?
    Join Date
    Jun 2004
    Posts
    33
    Ok it's all good. I've gotten working now. Apparently the for loop at the end wasn't taking too kindly to my reseting the value of j. I've rewritten it as a while loop and the whole code is working fine now. Thanks again for your help guys.
    Quantum materiae materietur marmota monax si marmota monax materiam possit materiari?

    (How much wood would a wood chuck cut if a wood chuck could chuck wood?)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. C++ std routines
    By siavoshkc in forum C++ Programming
    Replies: 33
    Last Post: 07-28-2006, 12:13 AM
  2. A bunch of Linker Errors...
    By Junior89 in forum Windows Programming
    Replies: 4
    Last Post: 01-06-2006, 01:59 PM
  3. Function is called and I am trying to open a file
    By tommy69 in forum C Programming
    Replies: 88
    Last Post: 05-06-2004, 08:33 AM
  4. simulate Grep command in Unix using C
    By laxmi in forum C Programming
    Replies: 6
    Last Post: 05-10-2002, 04:10 PM
  5. Need a suggestion on a school project..
    By Screwz Luse in forum C Programming
    Replies: 5
    Last Post: 11-27-2001, 01:58 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21