Thread: binary search function on string array

  1. #1
    Registered User
    Join Date
    Oct 2007
    Posts
    3

    binary search function on string array

    This program is a dictionary program. I am loading a list of words from a file, sorting as I insert. The program also has a search function that uses the same code. Some of the words inserted into the array are in the wrong order. Am I missing something with this function?

    This program is being compiled on Debian Linux with the 4.1.2 gcc compiler.

    Thanks in advance.

    Code:
    int indexOfWord( char ** dictionary, int count, char *word )
    {
       int lowerBound     = 0; 
       int upperBound     = count - 1;
       int midPoint       = 0;
       int strCompResults = 0;
    
       while (lowerBound < upperBound ) {
         midPoint       = (upperBound + lowerBound)/2;
         strCompResults = strcmp(word,dictionary[midPoint]);
    
              if ( strCompResults ==  0) return midPoint;
         else if ( strCompResults  >  0) lowerBound     = midPoint + 1;
         else                            upperBound     = midPoint - 1;
     }
    
      return midPoint * -1;
    }

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    You'll want to verify that when you insert, that the next word is actually greater than the one you are inserting, and that the previous one is lesser.

    I'm pretty sure with those checks and some suitable printf(), you'll find what's wrong with your code [it is probably a -1 or +1 that is missing or "too much"].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Oct 2007
    Posts
    3
    I had a feeling it was a question of being +- 1. So I added an adjustment in the calling routine. I needed to multiply wordcount * -1 before doing the adjustment, but I am currently segfaulting. Any ideas?

    Code:
    void insertWord( char ***dictionary, int *count, int *capacity, char *word )
    {
       int wordLocation      = 0;
       int i = 0;
    
      if (*count == *capacity) /* need to double my array's capacity! */
             resizeDictionary(dictionary, capacity);
    
    /* find the location of where the word should reside. For the first few words the wordLocation
     * method does not seem to work, so I plan on doing the first few words manually. */
       wordLocation = indexOfWord( *dictionary, *count, word );
    
    
       if (wordLocation < 1 || *count == 0)
       {
    /*    A negative word location means that the word is not in the array */
          wordLocation *= -1;
    /* check to see if I am off by one, if so adjust the index. */
               if (wordLocation > 0 && wordLocation < *count && strcmp(word, *dictionary[wordLocation]+1) > 0) ++wordLocation;
          else if (wordLocation > 0 && wordLocation < *count && strcmp(word, *dictionary[wordLocation]-1) < 0) --wordLocation;
    
    /*    The for loop is written to start at the end and move every string out one position.
     *    thus freeing up a spot the the please returned by indexOfWordReturn */
          for (i=(*count)-1 ; i>=wordLocation ; --i )
          {
            (*dictionary)[i+1] = (*dictionary)[i];
          }
    /*    copy ptr to the correct location in the dictionary array */
          ++*count;
          (*dictionary)[wordLocation] = word;
       }
    }
    Last edited by gandolf989; 10-02-2007 at 09:12 AM.

  4. #4
    Registered User
    Join Date
    Oct 2007
    Posts
    4
    I don't know whether it is shear coincidence. I asked almost for a similar routine how it can be improvised. My logic is also more or less than same. Anyways coming to your point please find my observation. Also please note i have not tested it and i worked it out by scribbling on paper.

    Code:
    Assuming the element you want to search is at Position 9 and the number of elements is 18
    
    Elements -> 18
    Lower Bound -> 0
    Upper Bound -> (18 - 1) 17
    Mid Point -> (17 + 0) -> 8
    
    Search element is greater than position 8
    
    Lower Bound -> 9
    Upper Bound -> 17
    Mid Point ->   13
    
    Search element is less than position 13
    
    Lower Bound -> 9
    Upper Bound -> 12
    Mid Point   -> 10
    
    Search element is less than position 10
    
    Lower Bound -> 9
    Upper Bound -> 9
    
    Loop will exit (because of the while loop (lowerbound < upperbound)

    So replace the while loop with this condition and hopefully it should work. Please note it is not tested
    Code:
    (while lowerBound <= upperBound  )

    Hope it helps

    Regards

    Raj

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Ok, some thoughts:
    1. Write a linear search that finds the right positon. Call this function after you have got the result from your binary search. Compare the results.

    2. Instead of trying to fix up, just VERIFY that the conditions are correct (that the next/previous strings are greater/lesser than the one you are inserting).

    3. Write a piece of code that verifies that ALL strings are in order. Stop if they are not, with an indication of what strings and what position.

    Once you have identified a failing case, write a special case handling to start printing data when the failing case happens [e.g:
    Code:
     
    if (strcmp(word, "thisfails") == 0) logging = 1;  // replace "thisfails" with a selected case that does fail. 
    ... 
    
       if (logging) printf("midpoint = %d, searching at %s\n", midpoint, *dictionary[wordLocation]);
    ...
    ]
    Subject to syntactical or other errors :-)

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Registered User
    Join Date
    Oct 2007
    Posts
    3
    Thanks to all for the help. I took another look at the insertWord, the result must be off by 2 or more on occasion. So adding changing if to while causes the program to make the proper adjustment. and I also had to look at the previous item in the array to see if I was inserting to far back in the list. Once again using a while list to make the adjustment if the difference is more than one. This code should work for my needs. Thanks again.

    Code:
    void insertWord( char ***dictionary, int *count, int *capacity, char *word )
    {
       int wordLocation      = 0;
       int i = 0;
    
      if (*count == *capacity) /* need to double my array's capacity! */
             resizeDictionary(dictionary, capacity);
    
    /* find the location of where the word should reside. For the first few words the wordLocation
     * method does not seem to work, so I plan on doing the first few words manually. */
       wordLocation = indexOfWord( *dictionary, *count, word );
    
       if (wordLocation < 1 || *count == 0)
       {
    /*    A negative word location means that the word is not in the array */
          wordLocation *= -1;
    /*    check to see if I am off by one, if so adjust the index. */
          if (*count > 0 )
          {
             while (wordLocation < *count && strcmp(word, (*dictionary)[wordLocation])   > 0) ++wordLocation;
    
             while (wordLocation > 0      && strcmp(word, (*dictionary)[wordLocation-1]) < 0) --wordLocation; 
          }
    
    /*      printDictionary( *dictionary, *count, stdout );
          printf("%d \n",wordLocation );
    */
    /*    The for loop is written to start at the end and move every string out one position.
     *    thus freeing up a spot the the please returned by indexOfWordReturn */
          for (i=(*count)-1 ; i>=wordLocation ; --i )
          {
            (*dictionary)[i+1] = (*dictionary)[i];
          }
    /*    copy ptr to the correct location in the dictionary array */
          ++*count;
          (*dictionary)[wordLocation] = word;
       }
    }

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    You really SHOULD find WHY you're not getting the right index out of your search. This may have fixed the apparent problem, but it should be perfectly possible to get a binary search to come up with EXACTLY the right position.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Getting an error with OpenGL: collect2: ld returned 1 exit status
    By Lorgon Jortle in forum C++ Programming
    Replies: 6
    Last Post: 05-08-2009, 08:18 PM
  2. doubt in c parser coding
    By akshara.sinha in forum C Programming
    Replies: 4
    Last Post: 12-23-2007, 01:49 PM
  3. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  4. Interface Question
    By smog890 in forum C Programming
    Replies: 11
    Last Post: 06-03-2002, 05:06 PM
  5. Hi, could someone help me with arrays?
    By goodn in forum C Programming
    Replies: 20
    Last Post: 10-18-2001, 09:48 AM