search algorithm - making statistical decisions

This is a discussion on search algorithm - making statistical decisions within the C++ Programming forums, part of the General Programming Boards category; Let's assume I have two containers. Code: std::vector<unsigned long> vec1; std::vector<unsigned long> vec2; Both containers will hold a large number ...

  1. #1
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Portugal
    Posts
    7,474

    search algorithm - making statistical decisions

    Let's assume I have two containers.

    Code:
    std::vector<unsigned long> vec1;
    std::vector<unsigned long> vec2;
    Both containers will hold a large number of possible values. However, they are mutually exclusive. Also, the containers are both already sorted.

    Now,
    I want to search one particular value and the search will be done repeteadly for a large number of values. Naturally, I start with one container and if I don't find the value, I move on to the other.

    But... what about optimizing this process based on probabilities? Instead of simply starting on Vec1 and then moving to Vec2, I'm thinking avoiding the best I can the linear compexity of the find algorithm and write my own, based also on the fact the vectors are sorted.

    For this I first determine the distance of the value to search (lets call it val) from both containers boundaries:

    Code:
    unsigned long distVec1Max = *--vec1.end() - val; // and the same for vec2
    unsigned long distVec1Min = val - vec1.begin();
    Based on this info I decide which container to search first. Typically I want to first search the vector in which distance to the value to be found is less, doing a reverse search if appropriate. If I reach a value that is higher than the value being searched (or lower if doing a reverse search), I stop and move on to the next container.


    Question is... Can you see a possible gain if I do this? The calculations prior to the search and the switching of containers can put a tax on the performance that is probabilistically not justifiable, considering the alternative of doing a linear search on each vector?

    (note: A set would solve this, of course. But I just used a simple example here with containers of unsigned long. Let's assume the contents do not, and cannot, have a < operator.)

    EDIT: Another alternative is searching both containers at the same time, one index at a time, still using the distance to determine what kind of search to do on each container (if normal or reversed)....
    Last edited by Mario F.; 11-18-2006 at 08:51 AM.
    The programmer’s wife tells him: “Run to the store and pick up a loaf of bread. If they have eggs, get a dozen.”
    The programmer comes home with 12 loaves of bread.


    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    835
    I'm missing something - can't you use binary_search, with log(N) time?

    Edit: Actually, it would be lower_bound and/or upper_bound.

    Edit: Or better yet, equal_range (if you need all of them).
    Last edited by robatino; 11-18-2006 at 09:01 AM.

  3. #3
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,558
    If they're already sorted, then I'd be doing "bsearch" on them, if they were arrays in C.

    Or in C++, this
    http://www.sgi.com/tech/stl/binary_search.html
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  4. #4
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Portugal
    Posts
    7,474
    Actually I can... and I also contradicted myself when I said the values cannot have a < operator. They must have if I want them sorted... duh!

    I obviously didn't give this much though before jumping on the forums...
    The programmer’s wife tells him: “Run to the store and pick up a loaf of bread. If they have eggs, get a dozen.”
    The programmer comes home with 12 loaves of bread.


    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Contest Results - May 27, 2002
    By ygfperson in forum A Brief History of Cprogramming.com
    Replies: 18
    Last Post: 06-18-2002, 01:27 PM
  2. Again Character Count, Word Count and String Search
    By client in forum C Programming
    Replies: 2
    Last Post: 05-09-2002, 11:40 AM
  3. Making a search engine.
    By sean in forum C++ Programming
    Replies: 1
    Last Post: 01-27-2002, 10:44 AM
  4. Ornery binary search algorithm
    By Unregistered in forum C Programming
    Replies: 3
    Last Post: 11-17-2001, 02:32 PM
  5. Help with some complex decision making
    By mlupo in forum C Programming
    Replies: 3
    Last Post: 11-17-2001, 12:45 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21