Thread: Search question

  1. #1
    Widdle Coding Peon Aerie's Avatar
    Join Date
    Dec 2004
    Posts
    115

    Search question

    I was thinking about search functions, and how they work, and while I imagine this has been done before, I want to know if it's a good idea:
    Code:
    search(char target[], char strtofind[], int stfsiz)
    {
    int i = 0;
    int dectr;
    while((i+stfsiz) <= MAXTARGETSIZE && target[i] != '\0'){
    	if(target[i] == strtofind[0]){
    		dectr = stfsiz;
    		while(dectr>0){
    			if(target[i+dectr] == strtofind[dectr]){
    				if(dectr==0)
    					return i;
    				else
    					dectr--;
    				}
    			else {
    				i++;
    				break;
    				}
    			}
    		}
    	else
    		i++;
    	}
    return -1;
    }

    The logic here is that if the string you're searching for in the target is very long, it's probably a good idea to start at the end of it and 'search back' from where the end of the string being searched for would be, assuming i is the beginning of a match. This works on the assumption that it's more likely that you'll run into divergence between a match and a non-match further from the beginning of them than closer.

    IE., I am assuming that while searching for, for example, "mushroom" in an arbitrary string of letters, that I'm less likely to encounter an m and then an o and another o while searching backward from the i+7th char than I will be to find an u, s, and h counting forward.

    I know there are better ways to impliment this, but hopefully this will demonstrate the idea I have without making me have to work too much on an idea I'm not even sure is valid.
    I live in a giant bucket.

  2. #2
    Registered User
    Join Date
    Sep 2001
    Posts
    4,912
    What exactly are you trying to improve? Accuracy? Speed and Efficiency? In either case, I don't see how searching backwards would aid you. You still need to see the whole string no matter what.

  3. #3
    Widdle Coding Peon Aerie's Avatar
    Join Date
    Dec 2004
    Posts
    115
    I'm trying to increase the speed at which a non-hit will be detected.

    Basically, the answer to the question I have is mostly statistical, since it deals with the chance a stream of data will diverge/reconverge with the string being searched for.

    So maybe I should go find a statistics board and ask there...
    I live in a giant bucket.

  4. #4
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >I'm trying to increase the speed at which a non-hit will be detected.
    Just for fun, try something like this:
    Code:
    loop over each character
      if length(text[i]) < length(match)
        return not found
      if text[i] == match[0] and text[i + length(match)] == match[length(match)]
        do thorough test
    If the first and last characters of both the text and search string match, what's the probability that it'll be a hit? What if the first, last, and middle characters match? The trick for optimizing for search misses is to minimize how many comparisons you make. This is more difficult for general search algorithms because you can't assume anything about the data. But it's an entertaining exercise, if nothing else.
    My best code is written with the delete key.

  5. #5
    Widdle Coding Peon Aerie's Avatar
    Join Date
    Dec 2004
    Posts
    115
    Prelude:
    That's pretty much how I was operating; assuming that as long as you keep the number of operations per comparison low, and distribute the location of the comparisons such that they're at least not completely linear, that some performance would be gained...

    The code you just wrote out was actually nearly identical to another idea I had, though I was considering extending it to perform a rotating search from the 'middle out' of the target section of the array. I think I might enjoy writing search functions once I get a better grip on C overall.
    I live in a giant bucket.

  6. #6
    The C-er
    Join Date
    Mar 2004
    Posts
    192
    Hi Aerie,

    I think I see what your saying. Is it to do with spatial locality of the data set?

  7. #7
    verbose cat
    Join Date
    Jun 2003
    Posts
    209
    The problem with not being able to make assumptions about your data (such as alphabetical order, most large words are near the end, etc.) is that whether you start your search at the beginning or start from some place else, you still have to go through the same number of comparisons for any given search. Given a data set:

    C F A D E B G

    if you are looking for 'A' and your algorithm starts in the middle you still have to do at least 2 comparisons. Looking for 'G' would still require you to compare each item in the list.

  8. #8
    Registered User
    Join Date
    Dec 2004
    Posts
    5
    May I suggest more generic linear time algorithms such as KMP? Tricks such as starting in the middle and starting at the end are not great assumptions to make at mentioned previously.

    KMP stands for Knuth-Morris-Parret. (I could have mispelled this!)
    Try Googling. :-)

    I believe string searching takes linear time minimum. (i.e. there's no magic algorithm that will let you find it without searching through the whole text at least once in the worst case.)

  9. #9
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    "boyer moore" is another fast string searching algorithm (just for interest)

  10. #10
    Widdle Coding Peon Aerie's Avatar
    Join Date
    Dec 2004
    Posts
    115
    Of course you cannot impliment a search algo that does not check the length of the search string, but there are surely ways to, without compromising that stipulation, compare all pertinent elements in a manner that will detect non-hits sooner.

    Almost all search scenarios have the one reliable condition of their target data sets be that the 'hits' will outnumber the 'non-hits'.

    In other words, it does not really matter so much most of the time how long it takes to assure yourself that your potential match is actually a match, as long as this time penalty is offset by your efficiency in eliminating all non-hits that do not share a high level of similarity with the search string, assuming the target does not contain an inordinately high number of these 'almost-hits'.
    I live in a giant bucket.

  11. #11
    Widdle Coding Peon Aerie's Avatar
    Join Date
    Dec 2004
    Posts
    115
    PS. I also will take a look at these algos mentioned, maybe they will give me some ideas(or show me how poor my current ones are).
    I live in a giant bucket.

  12. #12
    Registered User
    Join Date
    Dec 2004
    Posts
    5
    I just mentioned KMP because...

    (1) KMP is proved mathematically! :-) (I'm a computer science major)

    (2) Yes, it is true that you can try to turn up non-matchs/hits sooner, but how will you prove that this is the case for all general cases?

    I believe it took years and years to come up with a string matching algorithm. Probably, a new algorithm will be welcomed, but the real difficulty is proving that it works.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Attempt to write function search()
    By elsewhere in forum C Programming
    Replies: 6
    Last Post: 12-03-2008, 08:18 AM
  2. linked list search question..
    By transgalactic2 in forum C Programming
    Replies: 12
    Last Post: 10-29-2008, 04:40 PM
  3. searching and insertion in a binary search tree
    By galmca in forum C Programming
    Replies: 1
    Last Post: 03-26-2005, 05:15 PM
  4. Firefox and Google Search
    By DeepFyre in forum Tech Board
    Replies: 0
    Last Post: 01-16-2005, 10:28 AM
  5. Simple search program
    By colinuk in forum C Programming
    Replies: 6
    Last Post: 12-18-2004, 01:58 AM