search algorithm - making statistical decisions
Let's assume I have two containers.
Both containers will hold a large number of possible values. However, they are mutually exclusive. Also, the containers are both already sorted.
std::vector<unsigned long> vec1;
std::vector<unsigned long> vec2;
I want to search one particular value and the search will be done repeteadly for a large number of values. Naturally, I start with one container and if I don't find the value, I move on to the other.
But... what about optimizing this process based on probabilities? Instead of simply starting on Vec1 and then moving to Vec2, I'm thinking avoiding the best I can the linear compexity of the find algorithm and write my own, based also on the fact the vectors are sorted.
For this I first determine the distance of the value to search (lets call it val) from both containers boundaries:
Based on this info I decide which container to search first. Typically I want to first search the vector in which distance to the value to be found is less, doing a reverse search if appropriate. If I reach a value that is higher than the value being searched (or lower if doing a reverse search), I stop and move on to the next container.
unsigned long distVec1Max = *--vec1.end() - val; // and the same for vec2
unsigned long distVec1Min = val - vec1.begin();
Question is... Can you see a possible gain if I do this? The calculations prior to the search and the switching of containers can put a tax on the performance that is probabilistically not justifiable, considering the alternative of doing a linear search on each vector?
(note: A set would solve this, of course. But I just used a simple example here with containers of unsigned long. Let's assume the contents do not, and cannot, have a < operator.)
EDIT: Another alternative is searching both containers at the same time, one index at a time, still using the distance to determine what kind of search to do on each container (if normal or reversed)....