Thread: efficiency problem

  1. #1
    Registered User
    Join Date
    Jul 2002
    Posts
    8

    efficiency problem

    My questions is how can I make this program more efficient. I have ran out of ideas of what to do. This program takes way too long to execute. I just starting coding in C 3 months ago so I don't know what else to do.

    This code is part of a bigger program, and is the bottle neck. I have to call function meaningful about 2000 times, which checks if the argument passed to it is actually a word. The function goes through a dictionary to see if str is actually a word.

    I would appreciate any help I can get on this to make it faster.


    Code:
    #include <stdio.h>
    #include <string.h>
    
    main()
    {
    	void meaningful(char *);
    
    	char hello[20] = {'b','r','e','a','d'};
    	int i, ticktok = 0;
    	clock(NULL);
    	for(i = 0; i < 2000; i++)
    		meaningful(hello);
    
    	ticktok = clock(NULL);
    	printf("Clocks: %i\n", ticktok);
    }
    
    void meaningful(char *str)
    {
    	char temp[20];
    	int i = 0, match=0, t;
    
    	FILE *fp;
    	fp = fopen("/usr/share/lib/dict/words", "r");
    
    	while(fscanf(fp, "%s", temp) != EOF){
    
    		if(isupper(temp[i]))
    			temp[i] = tolower(temp[i]);
    
    		if(temp[0] != str[0])
    			continue;
    		if(temp[1] != str[1])
    			continue;
    		if(temp[2] != str[2])
    			continue;
    
    		t = strcmp(temp,str);
    
    		if(t == 0){
    			//printf("****match found*****\n");
    			//printf("%s\n",temp);
    			break;
    		}
    		if(t > 0)
    			break;
    	}
    	fclose(fp);
    }

  2. #2
    Registered User moi's Avatar
    Join Date
    Jul 2002
    Posts
    946
    fpoen() once outside of meaningful(), and keep it open until you're done

    other than that the actual searching of the dictionary is probably pretty slow. someone else might be able to point you to some good info on advanced search techniques, hash tables etc, but i know not of these things.

  3. #3
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    Well, first you have the usual suspects such as all the testing you do in meaningful and opening and closing the file so much. All of which can be fixed simply by doing something like this:
    Code:
    int meaningful(const char *str, FILE *fp)
    {
      char temp[20];
    
      /*
      ** This function checks for an exact match
      ** with the dictionary, no modification of the
      ** argument is made.
      */
      while(fscanf(fp, "%s", temp) != EOF)
        if(strcmp(temp,str) == 0)
          return FOUND;
      return NOTFOUND;
    }
    This could still be a bottleneck because you perform a linear search of the file so many times. A quicker way would be to open the file once and load its contents into a binary tree, then close the file. Now you can perform a speedy binary search of the tree instead of a slow linear search of the file.
    Code:
    int meaningful(const char *str)
    {
      /*
      ** This function checks for an exact match
      ** with the dictionary, no modification of the
      ** argument is made.
      */
      return searchTree(str);
    }
    Where searchTree returns a FOUND or NOTFOUND value.

    -Prelude
    My best code is written with the delete key.

  4. #4
    Registered User
    Join Date
    Jul 2002
    Posts
    8
    If I put the contents of the dictionary into a BST, how would I make it so the BST would be balanced? Since the dictionary is already in order, I would have a BST that would look like a linked list.

    Oh, and I have no idea what hash tables are.

  5. #5
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >Since the dictionary is already in order, I would have a BST that would look like a linked list.
    That's bad of course, look into AVL trees. I ran a little test program and searching a small wordbase with an AVL tree 2 million times took about 7/10 of a second. I imagine such performance would remove your bottleneck? I'm sure you can find detailed information on AVL trees all over the web, so I won't post the code here.

    -Prelude
    My best code is written with the delete key.

  6. #6
    geek SilentStrike's Avatar
    Join Date
    Aug 2001
    Location
    NJ
    Posts
    1,141
    What about a hash table? They are easier to code IMO (rebalancing an AVL tree is icky), and the lookup of a well-performing hash is O(1) rather than O(lg(n)), so it scales with the size of the dictionary better.
    Prove you can code in C++ or C# at TopCoder, referrer rrenaud
    Read my livejournal

  7. #7
    Registered User
    Join Date
    Jul 2002
    Posts
    8
    Thanks for the help. This assignment is due tomarrow and the execution time is worth 40% of the assignment. I think I'll be getting a 60% on this assignment because I don't think I can code a hash table or an AVL tree in time. hehe

  8. #8
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >rebalancing an AVL tree is icky
    I hear you there. Maintaining a balanced tree is about as icky as it gets.

    >so it scales with the size of the dictionary better
    You also have to consider insertion and deletion. If the OP's program adds or removes words from the dictionary then the execution time of a hash table is probably less efficient than a tree. But I agree that if all the program does is search for words then a hash table is ideal.

    Here's how a hash table works, play with it until you understand. But don't turn in my code because you may be asked to explain how it works. If you can't then you'll get a failing grade for stealing someone else's code.
    Code:
    /*
    ** Please excuse any mistakes, I just threw this together.
    */
    #include <time.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define NOTFOUND 0
    #define FOUND    1
    #define DICTSIZE 13
    #define MWORDLEN 20
    #define FILENAME "anyfile.dat"
    
    static char **hashTable;
    static int maxElements;
    
    static void initT ( int N )
    {
      int i;
      maxElements = N * 2;
      hashTable = malloc ( maxElements * sizeof *hashTable );
      if ( hashTable != NULL )
        for ( i = 0; i < maxElements; i++ )
          hashTable[i] = NULL;
    }
    
    static int hash ( const char *word )
    {
      int hashValue = 0, base = 127;
      while ( *word != '\0' )
        hashValue = ( base * hashValue + *word++ ) % maxElements;
      return hashValue;
    }
    
    static void addT ( const char *word )
    {
      int i = hash ( word );
      while ( hashTable[i] != NULL )
        i = ( i + 1 ) % maxElements;
      hashTable[i] = malloc ( strlen ( word ) + 1 );
      if ( hashTable[i] != NULL )
        strcpy ( hashTable[i], word );
    }
    
    static int searchT ( const char *word )
    {
      int i = hash ( word );
      while ( hashTable[i] != NULL ) {
        if ( strcmp ( hashTable[i], word ) == 0 )
          return FOUND;
        else
          i = ( i + 1 ) % maxElements;
      }
      return NOTFOUND;
    }
    
    static void freeT ( void )
    {
      if ( hashTable == NULL )
        return;
      while ( maxElements-- > 0 )
        free ( hashTable[maxElements] );
      free ( hashTable );
    }
    
    static int meaningful(const char *str)
    {
      return searchT(str);
    }
    
    int main ( void )
    {
      int i;
      FILE *fp;
      clock_t s, e;
      char *test = "bread",
           buf[MWORDLEN];
      /*
      ** Build a hash table from the file.
      */
      initT ( DICTSIZE );
      fp = fopen ( FILENAME, "r" );
      if ( fp != NULL ){
        while ( fgets ( buf, sizeof buf, fp ) != NULL ) {
          buf[strcspn(buf, "\n")] = '\0';
          addT ( buf );
        }
        fclose ( fp );
      }
    
      /* Search the table...a lot. */
      s = clock();
      for ( i = 0; i < 2000000; i++ )
        meaningful ( test );
      e = clock();
    
      printf ( "Cycles: %ld\n", (e - s) );
      /* Clean up */
      freeT ();
      return 0;
    }
    -Prelude
    My best code is written with the delete key.

  9. #9
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > Since the dictionary is already in order
    The simplest thing then is to read the dictionary into an array, and search it using the standard library function bsearch (in stdlib.h).

    All you need is the compare function, which has the same interface as qsort.

    The only tricky bit is counting the number of words to allocate the array in the first place, but see past posts for help on dynamically created arrays

  10. #10
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    If everything is in order you really shouldn't be having too many speed problems. What you could do is put each definition on its own line and make each line the same length. This way you could just do some very simple pointer arithmetic to skip over lines. I dunno, you probably already turned in the assignment.

  11. #11
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    You could also use a hash table (26 buckets, one for each letter), which would cut your search time a great deal.

    Quzah.
    Hope is the first step on the road to disappointment.

  12. #12
    Sayeh
    Guest
    I agree with Quzah. This is pretty standard database file-structures type stuff.

    Open the file once before you do the search. Build an n-way height-balanced tree for your data if the database isn't too large (that is, if it will fit in RAM for the ultimate in speed). And create a hash-table.

    As for pure execution speed, here are a few basics:

    -- minimize or eliminate stack time
    -- do things once that can be done once, not over and over
    -- minimize or eliminate tests (decisions take time)

    There are other things, but these will take a big bite out of time loss.

    There are ultimately only 2 roads known to improving performance, and _all_ performance enhancements eventually travel one or both of these roads. Each path has a cost. You either sacrifice intelligence for brute force (an example would be say performing the same instruction 100 times in a row, rather than using a loop because this eliminates expensive tests), or you sacrifice RAM (use lots of it, such as by using tables or trees variables arranged specifically to help you) so that you are performing lookups rather than calculations and tests.

    Small performance gains, a few microseconds here or there, don't seem like much, but if you have to do something 2 million times-- it adds up.

    I and a few others here, more than a year ago had a chance to demonstrate this to everyone with a little impromptu contest (it turned into one) whereby we just reversed the order of the bits in a byte.

    As I recall, a developer here used C++ and ran the program 10000 times-- it took aproximately 46 seconds. I used brute-force C and did the same test in about 12 seconds. The finalist chose to use table lookup and the test ran in just about 7 seconds. Assembler would have maybe shaved 2 seconds, had we gone that far.

    I traded intelligence, the finalist traded RAM. Very straightforward. And as it should be. Lookup is as fast as you can get.

  13. #13
    Unregistered
    Guest
    to me, both sound like increasing memory usage for increased speed.

  14. #14
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Originally posted by Unregistered
    to me, both sound like increasing memory usage for increased speed.
    Naturally. You can't have both.

    Quzah.
    Hope is the first step on the road to disappointment.

  15. #15
    Blank
    Join Date
    Aug 2001
    Posts
    1,034
    Probably too late but I've already done something like that http://www.cprogramming.com/cboard/s...highlight=hash

    I should have called a hash_delete function though.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Memory problem with Borland C 3.1
    By AZ1699 in forum C Programming
    Replies: 16
    Last Post: 11-16-2007, 11:22 AM
  2. Someone having same problem with Code Block?
    By ofayto in forum C++ Programming
    Replies: 1
    Last Post: 07-12-2007, 08:38 AM
  3. A question related to strcmp
    By meili100 in forum C++ Programming
    Replies: 6
    Last Post: 07-07-2007, 02:51 PM
  4. WS_POPUP, continuation of old problem
    By blurrymadness in forum Windows Programming
    Replies: 1
    Last Post: 04-20-2007, 06:54 PM
  5. Laptop Problem
    By Boomba in forum Tech Board
    Replies: 1
    Last Post: 03-07-2006, 06:24 PM