C Board  

Go Back   C Board > Community Boards > Article Discussions

Reply
 
LinkBack Thread Tools Display Modes
Old 01-28-2005, 10:23 PM   #1
Lead Moderator
 
kermi3's Avatar
 
Join Date: Aug 1998
Posts: 2,568
Our First Article - Searching Techniques By Prelude

Here is our first article for discussion. It is on Searching Techniques.
__________________
Kermi3

If you're new to the boards, welcome and reading this will help you get started.
Information on code tags may be found here

- Sandlot is the highest form of sport.
kermi3 is offline   Reply With Quote
Old 01-28-2005, 11:36 PM   #2
Software Developer
 
jverkoey's Avatar
 
Join Date: Feb 2003
Location: University of Waterloo
Posts: 1,916
Very good article, good read, however, in the first few lines it talks about hashing, and I personally haven't ever worked with hashing before...is it relatively simple to implement as far as a searching algorithm goes? I would be really interested in learning how to use this sort of thing and also, what data type works best with hashing...if any?

Also, the article mentions binary trees for searching faster, but would a binary tree be faster than just implementing a binary search, or are the two things the same thing in a sense?
jverkoey is offline   Reply With Quote
Old 01-29-2005, 02:09 AM   #3
Registered User
 
major_small's Avatar
 
Join Date: May 2003
Posts: 2,787
well written indeed. my only thought is that the first bit of code may be overly complex for what it is demonstrating. what I mean is that some people may get lost in the syntax, when what is more important is the algorithm behind it.

for example, something like this may get the point across better:
Code:
#include<iostream>

int main()
{
    int arr[]={1,32,4,6,4,253,6,1,24,10,35};
    int size=sizeof(arr)/sizeof(int);
    int index=-1;
    
    for(int i=0;i<size;i++)
    {
        if(arr[i]==10)
        {
            index=i;
            break;
        }
    }
    
    if(index>=0)
        std::cout<<"10 was found at index "<<index<<std::flush;
    else
        std::cout<<"10 was not found"<<std::flush;
    
    std::cin.get();
    return 0;
}
__________________
Join is in our Unofficial Cprog IRC channel
Server: irc.phoenixradio.org
Channel: #Tech


Team Cprog Folding@Home: Team #43476
Download it Here
Detailed Stats Here
More Detailed Stats
52 Members so far, are YOU a member?
Current team score: 1223226 (ranked 374 of 45152)

The CBoard team is doing better than 99.16% of the other teams
Top 5 Members: Xterria(518175), pianorain(118517), Bennet(64957), JaWiB(55610), alphaoide(44374)

Last Updated on: Wed, 30 Aug, 2006 @ 2:30 PM EDT
major_small is offline   Reply With Quote
Old 01-29-2005, 01:00 PM   #4
Crazy Fool
 
Perspective's Avatar
 
Join Date: Jan 2003
Location: Canada
Posts: 2,588
Quote:
Originally Posted by jverkoey
Very good article, good read, however, in the first few lines it talks about hashing, and I personally haven't ever worked with hashing before...is it relatively simple to implement as far as a searching algorithm goes?
The algorithm for searching structures built with hashing is very simple. You run your hash function on the object being searched for to get an index for that object. You then check the index for the object; if the object is at that index, your done! if the object found at the index is null, the structure does not contain the object. Lastley, if the there is an object at the index, but its not the object your searching for, you have a collision; you then iterate by a means determined by your structure (often sequentially) untill you either find the object (its found at index + iteration) or you find a null entry (the object doesnt exist in the structure)

Quote:
Originally Posted by jverkoey
what data type works best with hashing...if any?
Any data type that you can come up with a hashing function for will work just fine in a hash structure. The key part of any hashing structure is to have a good hash function. A poor hash function will result in too many collisions reducing performance.

Quote:
Originally Posted by jverkoey
Also, the article mentions binary trees for searching faster, but would a binary tree be faster than just implementing a binary search, or are the two things the same thing in a sense?
I can't see why one would be faster than the other on a theoretical level. Perhaps someone else can comment on the difference in performance.
Perspective is offline   Reply With Quote
Old 01-29-2005, 01:10 PM   #5
Crazy Fool
 
Perspective's Avatar
 
Join Date: Jan 2003
Location: Canada
Posts: 2,588
Quote:
Originally Posted by major_small
my only thought is that the first bit of code may be overly complex for what it is demonstrating. what I mean is that some people may get lost in the syntax, when what is more important is the algorithm behind it.
I'd have to disagree here. I think anyone persuing this subject should have a strong enough grasp on programming to understand (or figure out) the examples through experimentation both on the computer and with examples on paper as mentioned in the article.

My biggest complaint with the example you posted would be that the it doesnt offer an "as is" usable function because the entire algorithm is coded in main. Algorithms like this should be broken into [re]usable components. The code in the article is usable "as is".

I do agree with your underlying point though, the article could supply a driver program to reduce confusion and give the reader something concrete to exeriment with. (though i did notice that writing a driver program was the intent of Exercise 2)
Perspective is offline   Reply With Quote
Old 01-29-2005, 02:26 PM   #6
+++ OK NO CARRIER
 
quzah's Avatar
 
Join Date: Oct 2001
Posts: 10,262
A binary tree is probably faster than a binary search. With a tree, all you're doing is moving down to the next pointer. With a binary search, you're preforming at least some math each time. I suspect it will always be faster to simply access one more pointer than it will be to find the next spot to jump to using said math.
Code:
middle = ( beginning + end ) >> 1;
Even if we shift there, you've still got addition plus a shift. Where as with a tree, you just move to the next pointer. So assuming the comparison test is identical, and you've both got one assignment, the tree should still be faster.

If I'm incorrect, feel free to let me know, because I haven't played around calculating notations and the like.

Quzah.
__________________
Hundreds of thousands of dipshits can't be wrong.


Are you up for the suck?
quzah is offline   Reply With Quote
Old 01-29-2005, 03:56 PM   #7
and the hat of marbles
 
Sang-drax's Avatar
 
Join Date: May 2002
Location: Göteborg, Sweden
Posts: 2,038
Good article.

Perhaps it could be mentioned that interpolation search will only be O(log log n) if the data is uniformly distributed in the range between low and high, because the interploation used is linear, unlike binary search where O(log n) is the worst case.

Quote:
Originally Posted by Perspective
I'd have to disagree here. I think anyone persuing this subject should have a strong enough grasp on programming to understand (or figure out) the examples through experimentation both on the computer and with examples on paper as mentioned in the article.
Yes, but sequential search isn't a very advanced topic; it's a topic which many newbies are struggling with. I don't think an example like Major Small's suggestion would be out of place.
__________________
Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling
Sang-drax is offline   Reply With Quote
Old 01-29-2005, 04:34 PM   #8
and the hat of marbles
 
Sang-drax's Avatar
 
Join Date: May 2002
Location: Göteborg, Sweden
Posts: 2,038
If I'm not mistaken, interpolation search can actually be slower than binary search even for extremely huge lists, if the entries in the list are chosen carefully.

Consider the list L with N elements (sorted).
The element l[n] at position n is defined as:
l(n) = N^n

So, the list looks like this:
{N^0, N^1, N^2, ... , N^N}

When searching the list with binary search, the time complexity will be on average O(log2 n), because each iteration halves the number of entries where the wanted element might be.

Now, when using interpolation search, the first iteration will yield (using Prelude's variable names):
key = N^n
low_diff = N^n - 1
range_diff = N^N - 1
count_diff = N - 0 = N
range = N*(N^n - 1) / (N^N - 1) + 0

range < N*(N^n - 1) / N^N = 1/ N^(N-n-1) - 1/N^(N-1) < 1 for all n<N.

So, if the element searched for isn't the last (n<N), the first calculation of range will equal 0 (when truncated) and the new value of low will be 1 instead of 0.
It continues like this, the low boundary will increase by one each iteration, so the complexity of the interpolation search will be O(n).


Note that this is by no means any argument against using interpolation search. This is more of theoretical interest, if it's of any interest at all.



EDIT:
Hmm, actually the list hasn't N elements; it has N+1. The argument still holds though. Remove the first element from the list (N^0).
__________________
Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

Last edited by Sang-drax; 01-29-2005 at 04:40 PM.
Sang-drax is offline   Reply With Quote
Old 01-29-2005, 05:49 PM   #9
Carnivore ('-'v)
 
Hunter2's Avatar
 
Join Date: May 2002
Posts: 2,865
An idea about a possible application for the article (where hashing is impractical):

You may have a hash table that 'maps' keys to values, but what if you want the inverse operation (find the key of a value)? I suppose you could generate two different hash tables, one for each direction, but if it's more difficult to hash the value than the key (i.e. integer key, structure value), it may be more expedient to simply implement a search for the inverse operation rather than going to the trouble of coming up with an effective hash algorithm.
__________________
Just Google It. √

(\ /)
( . .)
c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.
Hunter2 is offline   Reply With Quote
Old 01-29-2005, 11:12 PM   #10
Magically delicious
 
LuckY's Avatar
 
Join Date: Oct 2001
Posts: 856
Quote:
I personally haven't ever worked with hashing before...is it relatively simple to implement as far as a searching algorithm goes?
I don't know what it is, but the very idea of using hashing is very appealing to me. You take a value, any value of any type, and run it through a process that performs arithmetic on the bits and generates a numeric value which is used as an index. It's like magic. Of course, as Perspective mentioned, the main thing to be concerned with is that you use an algorithm that does not generate the same hash value for two unique values (or, at least one that is very unlikely to cause said collisions except extremely infrequently). As far as the difficulty in implementing a search algorithm goes, it's extremely simple. You just use the index returned from your hash function and if a value is there, you've found it. In the very simplest form, a search would have a complexity of O(1) [you could do something like `if (series[hashFunc("value")] != NULL) ...`].

Quote:
Also, the article mentions binary trees for searching faster, but would a binary tree be faster than just implementing a binary search, or are the two things the same thing in a sense?
Strictly speaking, a binary search tree does not have a better complexity than a binary search. In fact, it has a worse complexity. In the worst possible case, a binary search tree can require the traversal of all elements in a search (O(n)), but a binary search has a much better complexity (O(log n)). This is because the shape of the tree cannot be guaranteed and if many sorted elements are inserted (yielding a tree with repeated right-hand children), it will become essentially a list instead of a tree. Now, an AVL tree (a binary search tree that is reformed as necessary after any insertion/deletion to retain a true tree shape) is much more comparable in complexity to a binary search. It also yields O(log n), so any difference in their application would be constant time, thus virtually identical (theoretically speaking).
LuckY is offline   Reply With Quote
Old 01-30-2005, 02:59 AM   #11
Software Developer
 
jverkoey's Avatar
 
Join Date: Feb 2003
Location: University of Waterloo
Posts: 1,916
Quote:
Originally Posted by LuckY
You take a value, any value of any type, and run it through a process that performs arithmetic on the bits and generates a numeric value which is used as an index. It's like magic.
Oh, I get it now, I've done this sort of thing before without realizing it. I was writing a .lwo renderer and the tags in the file are all four letter tags, so I made a function that when passed a string of 4 letters would return a unique integer value. *Rummages through code*, here we go:

Code:
unsigned int ParseTag(char* tag)
{
   srand((unsigned int)tag[0]);
   return (tag[0] + rand()%128) + (tag[1] + rand()%64) + (tag[2] + rand()%32) + (tag[3] + rand()%16);
}
And the function just assumes you send the proper size of array. However, as I ran through the list of tags, I came up with quite a duplicate tag every now and then which required changing the algorithm around to work with all of them.....So is there some universal hashing algorithm that works best with certain things? Or is the method I used above pretty much the best way to go?
jverkoey is offline   Reply With Quote
Old 01-30-2005, 11:14 AM   #12
Cat without Hat
 
CornedBee's Avatar
 
Join Date: Apr 2003
Posts: 8,439
The argument about hashing - well, the article talks about searching algorithms on common (perhaps sorted) arrays and linear lists. Not about binary trees or hash tables. In that case, skiplists should be discussed too. So perhaps we should have an article about data structures.

Hash functions must be deterministic, jverkoey. They must not contain a random element.
__________________
All the buzzt!
CornedBee

"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law
CornedBee is offline   Reply With Quote
Old 01-30-2005, 03:21 PM   #13
Super Moderator
 
Bubba's Avatar
 
Join Date: Aug 2001
Posts: 7,472
Good article. And I don't think the code is hard to read at all. Rather the functions look very succint and to the point. Very easy to follow.

So what's with having Prelude write the first article....yeah like we are all just gonna jump right into writing articles now. Tough act to follow.

Bubba is offline   Reply With Quote
Old 01-30-2005, 07:28 PM   #14
Toaster
 
Zach L.'s Avatar
 
Join Date: Aug 2001
Posts: 2,686
Quote:
Originally Posted by CornedBee
Hash functions must be deterministic, jverkoey. They must not contain a random element.
Well, as an accident of the way rand( ) is normally written (i.e. not hooked up to a lava lamp or something similar), that piece of code is deterministic, though a more proper hash function would define its own utility functions instead of using a dubiously named function (rand).

Anyway, a great article. I much prefer the code in usable functions as they are as opposed to in a small program as major_small suggested. The target audience could rather readily write small test programs that use these functions.

I really liked Prelude's use of problems in the article as well. They were all of an appropriate level, I thought, and all very valuable to the learning process. The code was clear as well.

I'll comment more (in a more helpful manner) later on.

Cheers
__________________
The word rap as it applies to music is the result of a peculiar phonological rule which has stripped the word of its initial voiceless velar stop.
Zach L. is offline   Reply With Quote
Old 01-31-2005, 03:04 AM   #15
Cat without Hat
 
CornedBee's Avatar
 
Join Date: Apr 2003
Posts: 8,439
Right, Zach, my fault. I didn't look at the argument to srand() and assumed the usual time(NULL).
__________________
All the buzzt!
CornedBee

"There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
- Flon's Law
CornedBee is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Logical errors with seach function Taka C Programming 4 09-18-2006 05:20 AM
C++ std routines siavoshkc C++ Programming 33 07-28-2006 12:13 AM
Visual C++ 2005 linking and file sizes Rune Hunter C++ Programming 2 11-12-2005 10:41 PM
Article in GDM about DarkBasic PRO DavidP A Brief History of Cprogramming.com 2 09-03-2003 06:27 PM


All times are GMT -6. The time now is 08:49 AM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.0 RC2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22