Thread: Our First Article - Searching Techniques By Prelude

  1. #1
    Lead Moderator kermi3's Avatar
    Join Date
    Aug 1998
    Posts
    2,595

    Our First Article - Searching Techniques By Prelude

    Here is our first article for discussion. It is on Searching Techniques.
    Kermi3

    If you're new to the boards, welcome and reading this will help you get started.
    Information on code tags may be found here

    - Sandlot is the highest form of sport.

  2. #2
    Software Developer jverkoey's Avatar
    Join Date
    Feb 2003
    Location
    New York
    Posts
    1,905
    Very good article, good read, however, in the first few lines it talks about hashing, and I personally haven't ever worked with hashing before...is it relatively simple to implement as far as a searching algorithm goes? I would be really interested in learning how to use this sort of thing and also, what data type works best with hashing...if any?

    Also, the article mentions binary trees for searching faster, but would a binary tree be faster than just implementing a binary search, or are the two things the same thing in a sense?

  3. #3
    Registered User major_small's Avatar
    Join Date
    May 2003
    Posts
    2,787
    well written indeed. my only thought is that the first bit of code may be overly complex for what it is demonstrating. what I mean is that some people may get lost in the syntax, when what is more important is the algorithm behind it.

    for example, something like this may get the point across better:
    Code:
    #include<iostream>
    
    int main()
    {
        int arr[]={1,32,4,6,4,253,6,1,24,10,35};
        int size=sizeof(arr)/sizeof(int);
        int index=-1;
        
        for(int i=0;i<size;i++)
        {
            if(arr[i]==10)
            {
                index=i;
                break;
            }
        }
        
        if(index>=0)
            std::cout<<"10 was found at index "<<index<<std::flush;
        else
            std::cout<<"10 was not found"<<std::flush;
        
        std::cin.get();
        return 0;
    }
    Join is in our Unofficial Cprog IRC channel
    Server: irc.phoenixradio.org
    Channel: #Tech


    Team Cprog Folding@Home: Team #43476
    Download it Here
    Detailed Stats Here
    More Detailed Stats
    52 Members so far, are YOU a member?
    Current team score: 1223226 (ranked 374 of 45152)

    The CBoard team is doing better than 99.16% of the other teams
    Top 5 Members: Xterria(518175), pianorain(118517), Bennet(64957), JaWiB(55610), alphaoide(44374)

    Last Updated on: Wed, 30 Aug, 2006 @ 2:30 PM EDT

  4. #4
    Crazy Fool Perspective's Avatar
    Join Date
    Jan 2003
    Location
    Canada
    Posts
    2,640
    Quote Originally Posted by jverkoey
    Very good article, good read, however, in the first few lines it talks about hashing, and I personally haven't ever worked with hashing before...is it relatively simple to implement as far as a searching algorithm goes?
    The algorithm for searching structures built with hashing is very simple. You run your hash function on the object being searched for to get an index for that object. You then check the index for the object; if the object is at that index, your done! if the object found at the index is null, the structure does not contain the object. Lastley, if the there is an object at the index, but its not the object your searching for, you have a collision; you then iterate by a means determined by your structure (often sequentially) untill you either find the object (its found at index + iteration) or you find a null entry (the object doesnt exist in the structure)

    Quote Originally Posted by jverkoey
    what data type works best with hashing...if any?
    Any data type that you can come up with a hashing function for will work just fine in a hash structure. The key part of any hashing structure is to have a good hash function. A poor hash function will result in too many collisions reducing performance.

    Quote Originally Posted by jverkoey
    Also, the article mentions binary trees for searching faster, but would a binary tree be faster than just implementing a binary search, or are the two things the same thing in a sense?
    I can't see why one would be faster than the other on a theoretical level. Perhaps someone else can comment on the difference in performance.

  5. #5
    Crazy Fool Perspective's Avatar
    Join Date
    Jan 2003
    Location
    Canada
    Posts
    2,640
    Quote Originally Posted by major_small
    my only thought is that the first bit of code may be overly complex for what it is demonstrating. what I mean is that some people may get lost in the syntax, when what is more important is the algorithm behind it.
    I'd have to disagree here. I think anyone persuing this subject should have a strong enough grasp on programming to understand (or figure out) the examples through experimentation both on the computer and with examples on paper as mentioned in the article.

    My biggest complaint with the example you posted would be that the it doesnt offer an "as is" usable function because the entire algorithm is coded in main. Algorithms like this should be broken into [re]usable components. The code in the article is usable "as is".

    I do agree with your underlying point though, the article could supply a driver program to reduce confusion and give the reader something concrete to exeriment with. (though i did notice that writing a driver program was the intent of Exercise 2)

  6. #6
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    A binary tree is probably faster than a binary search. With a tree, all you're doing is moving down to the next pointer. With a binary search, you're preforming at least some math each time. I suspect it will always be faster to simply access one more pointer than it will be to find the next spot to jump to using said math.
    Code:
    middle = ( beginning + end ) >> 1;
    Even if we shift there, you've still got addition plus a shift. Where as with a tree, you just move to the next pointer. So assuming the comparison test is identical, and you've both got one assignment, the tree should still be faster.

    If I'm incorrect, feel free to let me know, because I haven't played around calculating notations and the like.

    Quzah.
    Hope is the first step on the road to disappointment.

  7. #7
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    Good article.

    Perhaps it could be mentioned that interpolation search will only be O(log log n) if the data is uniformly distributed in the range between low and high, because the interploation used is linear, unlike binary search where O(log n) is the worst case.

    Quote Originally Posted by Perspective
    I'd have to disagree here. I think anyone persuing this subject should have a strong enough grasp on programming to understand (or figure out) the examples through experimentation both on the computer and with examples on paper as mentioned in the article.
    Yes, but sequential search isn't a very advanced topic; it's a topic which many newbies are struggling with. I don't think an example like Major Small's suggestion would be out of place.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  8. #8
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    If I'm not mistaken, interpolation search can actually be slower than binary search even for extremely huge lists, if the entries in the list are chosen carefully.

    Consider the list L with N elements (sorted).
    The element l[n] at position n is defined as:
    l(n) = N^n

    So, the list looks like this:
    {N^0, N^1, N^2, ... , N^N}

    When searching the list with binary search, the time complexity will be on average O(log2 n), because each iteration halves the number of entries where the wanted element might be.

    Now, when using interpolation search, the first iteration will yield (using Prelude's variable names):
    key = N^n
    low_diff = N^n - 1
    range_diff = N^N - 1
    count_diff = N - 0 = N
    range = N*(N^n - 1) / (N^N - 1) + 0

    range < N*(N^n - 1) / N^N = 1/ N^(N-n-1) - 1/N^(N-1) < 1 for all n<N.

    So, if the element searched for isn't the last (n<N), the first calculation of range will equal 0 (when truncated) and the new value of low will be 1 instead of 0.
    It continues like this, the low boundary will increase by one each iteration, so the complexity of the interpolation search will be O(n).


    Note that this is by no means any argument against using interpolation search. This is more of theoretical interest, if it's of any interest at all.



    EDIT:
    Hmm, actually the list hasn't N elements; it has N+1. The argument still holds though. Remove the first element from the list (N^0).
    Last edited by Sang-drax; 01-29-2005 at 04:40 PM.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  9. #9
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879
    An idea about a possible application for the article (where hashing is impractical):

    You may have a hash table that 'maps' keys to values, but what if you want the inverse operation (find the key of a value)? I suppose you could generate two different hash tables, one for each direction, but if it's more difficult to hash the value than the key (i.e. integer key, structure value), it may be more expedient to simply implement a search for the inverse operation rather than going to the trouble of coming up with an effective hash algorithm.
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  10. #10
    Magically delicious LuckY's Avatar
    Join Date
    Oct 2001
    Posts
    856
    I personally haven't ever worked with hashing before...is it relatively simple to implement as far as a searching algorithm goes?
    I don't know what it is, but the very idea of using hashing is very appealing to me. You take a value, any value of any type, and run it through a process that performs arithmetic on the bits and generates a numeric value which is used as an index. It's like magic. Of course, as Perspective mentioned, the main thing to be concerned with is that you use an algorithm that does not generate the same hash value for two unique values (or, at least one that is very unlikely to cause said collisions except extremely infrequently). As far as the difficulty in implementing a search algorithm goes, it's extremely simple. You just use the index returned from your hash function and if a value is there, you've found it. In the very simplest form, a search would have a complexity of O(1) [you could do something like `if (series[hashFunc("value")] != NULL) ...`].

    Also, the article mentions binary trees for searching faster, but would a binary tree be faster than just implementing a binary search, or are the two things the same thing in a sense?
    Strictly speaking, a binary search tree does not have a better complexity than a binary search. In fact, it has a worse complexity. In the worst possible case, a binary search tree can require the traversal of all elements in a search (O(n)), but a binary search has a much better complexity (O(log n)). This is because the shape of the tree cannot be guaranteed and if many sorted elements are inserted (yielding a tree with repeated right-hand children), it will become essentially a list instead of a tree. Now, an AVL tree (a binary search tree that is reformed as necessary after any insertion/deletion to retain a true tree shape) is much more comparable in complexity to a binary search. It also yields O(log n), so any difference in their application would be constant time, thus virtually identical (theoretically speaking).

  11. #11
    Software Developer jverkoey's Avatar
    Join Date
    Feb 2003
    Location
    New York
    Posts
    1,905
    Quote Originally Posted by LuckY
    You take a value, any value of any type, and run it through a process that performs arithmetic on the bits and generates a numeric value which is used as an index. It's like magic.
    Oh, I get it now, I've done this sort of thing before without realizing it. I was writing a .lwo renderer and the tags in the file are all four letter tags, so I made a function that when passed a string of 4 letters would return a unique integer value. *Rummages through code*, here we go:

    Code:
    unsigned int ParseTag(char* tag)
    {
       srand((unsigned int)tag[0]);
       return (tag[0] + rand()%128) + (tag[1] + rand()%64) + (tag[2] + rand()%32) + (tag[3] + rand()%16);
    }
    And the function just assumes you send the proper size of array. However, as I ran through the list of tags, I came up with quite a duplicate tag every now and then which required changing the algorithm around to work with all of them.....So is there some universal hashing algorithm that works best with certain things? Or is the method I used above pretty much the best way to go?

  12. #12
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    The argument about hashing - well, the article talks about searching algorithms on common (perhaps sorted) arrays and linear lists. Not about binary trees or hash tables. In that case, skiplists should be discussed too. So perhaps we should have an article about data structures.

    Hash functions must be deterministic, jverkoey. They must not contain a random element.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  13. #13
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Good article. And I don't think the code is hard to read at all. Rather the functions look very succint and to the point. Very easy to follow.

    So what's with having Prelude write the first article....yeah like we are all just gonna jump right into writing articles now. Tough act to follow.


  14. #14
    Toaster Zach L.'s Avatar
    Join Date
    Aug 2001
    Posts
    2,686
    Quote Originally Posted by CornedBee
    Hash functions must be deterministic, jverkoey. They must not contain a random element.
    Well, as an accident of the way rand( ) is normally written (i.e. not hooked up to a lava lamp or something similar), that piece of code is deterministic, though a more proper hash function would define its own utility functions instead of using a dubiously named function (rand).

    Anyway, a great article. I much prefer the code in usable functions as they are as opposed to in a small program as major_small suggested. The target audience could rather readily write small test programs that use these functions.

    I really liked Prelude's use of problems in the article as well. They were all of an appropriate level, I thought, and all very valuable to the learning process. The code was clear as well.

    I'll comment more (in a more helpful manner) later on.

    Cheers
    The word rap as it applies to music is the result of a peculiar phonological rule which has stripped the word of its initial voiceless velar stop.

  15. #15
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Right, Zach, my fault. I didn't look at the argument to srand() and assumed the usual time(NULL).
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Logical errors with seach function
    By Taka in forum C Programming
    Replies: 4
    Last Post: 09-18-2006, 05:20 AM
  2. C++ std routines
    By siavoshkc in forum C++ Programming
    Replies: 33
    Last Post: 07-28-2006, 12:13 AM
  3. Visual C++ 2005 linking and file sizes
    By Rune Hunter in forum C++ Programming
    Replies: 2
    Last Post: 11-12-2005, 10:41 PM
  4. Article in GDM about DarkBasic PRO
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 2
    Last Post: 09-03-2003, 06:27 PM