Thread: C programing to create bucket of strings from a very large text files

  1. #1
    Registered User
    Join Date
    Nov 2011
    Posts
    3

    C programing to create bucket of strings from a very large text files

    I have an assignment to complete . It says I have to read a file which contains millions & millions of strings.
    I have to read the file and build a structure to hold the strings. This system must be able to answer the question "is this new string present?"
    I AM also expected to break the list down into "buckets" of strings so the 'string to match' is able to chose the correct bucket to search in (quickly) and that bucket should contain no more than total/hashMask strings or so (ie 3,000,000 / 0xFFF == 732 objects per bucket).
    Now I have created a structure of hash table and function to read a file and after that I am sort of clueless.
    Below is my sample code

    Code:
    #define MAX_NAME 100 
    typedef struct hashTable
    {   
      char key[MAX_NAME];
      struct hashTable *next;
    };    
    
     /*    
      this  function will read the file (assume one string per line)
      and create the list of lists (list of buckets), adding one object per string. 
     */
     HashList *loadDataSet(char *filename, int hashMask)     
     {     
        // to read a file
       char readString[ MAX_NAME];
       File *fp ;
    
        if( (fp = fopen(filename, "r") )== NULL)
        {
          printf(" failed to open the file\n");
          exit(0);
        }
        while( fgets ( readString,MAX_NAME -1, fp ) != NULL)
        {
         //need to break the list down into "buckets" of strings so the 'string to match'
         // is able to chose the correct bucket to search in (quickly)
         //and that bucket should contain no more than total/hashMask strings
         or so (ie 3,000,000   / 0xFFF == 732 objects per bucket). 
        }
      fclose(fp);
     }

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    So basically, you need to have something like
    Code:
    struct hashTable buckets[0x1000];  // a nearby prime would be a lot better
    Next, you need a hash function to take your string, and give you an index into your array. There is a lot of research out there on the best way(s) to do this, so that you get a pretty even distribution for any sample word list.

    Having found a bucket, you do a linear search.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Best way to create a large text file?
    By KenLP in forum C++ Programming
    Replies: 13
    Last Post: 05-10-2011, 01:38 PM
  2. Segmentation fault when reading very large text files
    By sapogo06 in forum C Programming
    Replies: 8
    Last Post: 12-07-2009, 03:19 PM
  3. intentially create program with large run-time memory
    By dsollen in forum C++ Programming
    Replies: 3
    Last Post: 11-02-2009, 12:07 PM
  4. Allocating memory for large text files
    By br52 in forum C Programming
    Replies: 13
    Last Post: 09-09-2009, 07:22 AM
  5. Search text files for matching strings
    By cableguy414 in forum C++ Programming
    Replies: 1
    Last Post: 09-08-2009, 01:53 PM

Tags for this Thread