search algorithm - making statistical decisions

**Mario F.** · 11-18-2006

Let's assume I have two containers.

Code:

std::vector<unsigned long> vec1;
std::vector<unsigned long> vec2;

Both containers will hold a large number of possible values. However, they are mutually exclusive. Also, the containers are both already sorted.

Now,
I want to search one particular value and the search will be done repeteadly for a large number of values. Naturally, I start with one container and if I don't find the value, I move on to the other.

But... what about optimizing this process based on probabilities? Instead of simply starting on Vec1 and then moving to Vec2, I'm thinking avoiding the best I can the linear compexity of the find algorithm and write my own, based also on the fact the vectors are sorted.

For this I first determine the distance of the value to search (lets call it val) from both containers boundaries:

Code:

unsigned long distVec1Max = *--vec1.end() - val; // and the same for vec2
unsigned long distVec1Min = val - vec1.begin();

Based on this info I decide which container to search first. Typically I want to first search the vector in which distance to the value to be found is less, doing a reverse search if appropriate. If I reach a value that is higher than the value being searched (or lower if doing a reverse search), I stop and move on to the next container.

Question is... Can you see a possible gain if I do this? The calculations prior to the search and the switching of containers can put a tax on the performance that is probabilistically not justifiable, considering the alternative of doing a linear search on each vector?

(note: A set would solve this, of course. But I just used a simple example here with containers of unsigned long. Let's assume the contents do not, and cannot, have a < operator.)

EDIT: Another alternative is searching both containers at the same time, one index at a time, still using the distance to determine what kind of search to do on each container (if normal or reversed)....

**robatino** · 11-18-2006

I'm missing something - can't you use binary_search, with log(N) time?

Edit: Actually, it would be lower_bound and/or upper_bound.

Edit: Or better yet, equal_range (if you need all of them).

**Salem** · 11-18-2006

If they're already sorted, then I'd be doing "bsearch" on them, if they were arrays in C.

Or in C++, this
http://www.sgi.com/tech/stl/binary_search.html

**Mario F.** · 11-18-2006

Actually I can... and I also contradicted myself when I said the values cannot have a < operator. They must have if I want them sorted... duh!

I obviously didn't give this much though before jumping on the forums...

Thread: search algorithm - making statistical decisions

Thread Tools

Search Thread

Display

search algorithm - making statistical decisions

Similar Threads

Contest Results - May 27, 2002

Again Character Count, Word Count and String Search

Making a search engine.

Ornery binary search algorithm

Help with some complex decision making