Thread: std::string::find vs std::find

  1. #1
    Registered User
    Join Date
    Jul 2009
    Posts
    5

    Question std::string::find vs std::find

    Hi All:

    I have a large number of elements to be stored into vector or basic string(delimited by \n). I need to search for an element and I would like to know what is faster:

    A:
    Code:
    std::string str = "\nElement1\nElement2...element1000\n"
    str.find("\nElement3\n");
    B:
    Code:
    std::vector<string> v;
    v.push_back("Element1"); v.push_back("Element2");  ....
    std::find(v.begin(), v.end(), "Element3") != v.end() )
    I'm only interested in search time, I don't need to access any of the elements for reading, modifying or removing.

    Thanks a lot,
    Petry

  2. #2
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    The second is likely to be faster.

    If we start with \nElement1\nElement2...element1000\n and we search for Element3... For each position in the string we have to start comparing whether the string is equal to to the one we're searching for. That is, we have the number of characters time an equals-test (operator==).

    In the second case, we only have to compare at the start of each element. So in stead of one per character of the string we only have to the equals-test (operator==) the same amount of times as the number of elements, which is obviously a lot less.

  3. #3
    Student legit's Avatar
    Join Date
    Aug 2008
    Location
    UK -> Newcastle
    Posts
    156
    With the first method, A, all you are doing is writing a new line and then a string of characters, which would be pretty tedious to find a certain character. The second however, is much more powerful. You are adding another string to the vector, which makes it a lot easier to manipulate.
    MSDN <- Programmers Haven!

  4. #4
    Registered User
    Join Date
    Jul 2009
    Posts
    5
    I have just tested the difference.

    Filesize: 8mb
    Lines: 700.000
    Average line length: 15 characters

    Time to fill:
    vector: 3843ms
    string: 3891ms

    The search time for the last unique element is:
    vector: 218
    string: 172

  5. #5
    Student legit's Avatar
    Join Date
    Aug 2008
    Location
    UK -> Newcastle
    Posts
    156
    Wow, I wasn't expecting that.
    MSDN <- Programmers Haven!

  6. #6
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Use a map.

    If you need to store items and then access them later via some criteria (IE: a key) a map would be better suited for this. If you want something quicker than a map then you could use the non-standard hash_map in MS STL or you could program your own hash container.
    Last edited by VirtualAce; 07-05-2009 at 10:31 AM.

  7. #7
    Registered User
    Join Date
    Jul 2009
    Posts
    5
    I only to search through the elements, no need to access by key.

  8. #8
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Then use std::set.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  9. #9
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by laserlight View Post
    Then use std::set.
    Seconded.
    Just make sure you use the member version of find, not std::find and you're performance will be far better instead of a tad worse.
    Code:
    std::set<std::string> s;
    s.insert("Element1"); s.insert("Element2");  ....
    if ( s.find("Element3") != s.end() )
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  10. #10
    Registered User linuxdude's Avatar
    Join Date
    Mar 2003
    Location
    Louisiana
    Posts
    926
    However, a set has a few points of interests:
    You cannot store the same element twice. If you need do this use a multiset
    It will sort your contents. If you need to keep the ordering use unordered_map (it's in tr1 so if your compiler doesn't support it, use boost::tr1)

  11. #11
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    >> If you need to keep the ordering use unordered_map
    unordered_map, or unordered_set will not keep the ordering. If you need to keep the ordering of how the items were added then vector is the best choice.

    The unordered containers are unordered. They aren't sorted but they don't retain the ordering of how you inserted the values (AFAIK). They might be the best option for this as they can have the fastest lookup times compared to the other choices, but you have to make sure you set up the hash properly.

    If you don't need to keep the elements in the order you added them, and don't want to use the unordered containers (or don't have access to them), there is another option that might be even better than std::set. That option is to use the vector, but before you search you sort the vector. Then use the binary_search algorithm instead of find. This duplicates the benefits of the binary tree that is how std::set is normally implemented, but adds less overhead because the items are stored in an array instead of linked in a tree.
    Last edited by Daved; 07-06-2009 at 11:54 AM.

  12. #12
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    The suggestion to use a std::set is a good one, but it is not the most efficient data structure for this problem. It is probable that many of the strings in your set are prefixes of each other. If this is the case, you should consider using a data structure which takes advantage of that fact.

    The naive implementation with std::set is inefficient if there are many similar strings with long, identical prefixes. The string comparison operation used during the set.find() operation will spend a lot of time comparing these prefixes. A tree data structure is still the right idea, but you should look at something like a ternary tree or a prefix tree.

    The lookup/insertion time is still O(log n), but with a far smaller constant factor. The memory usage is vastly improved as well.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  13. #13
    Registered User
    Join Date
    Jul 2009
    Posts
    5
    Quote Originally Posted by Daved View Post
    >>That option is to use the vector, but before you search you sort the vector. Then use the binary_search algorithm instead of find. This duplicates the benefits of the binary tree that is how std::set is normally implemented, but adds less overhead because the items are stored in an array instead of linked in a tree.
    I only need to check if an element exists no need to keep the order. I made a new test including set and Daved's suggestion, the results are below:

    Filesize: 14mb
    Lines: 1.050.000
    Average line length: 20 characters

    Time to fill:
    vector: 7610ms
    string: 8906ms
    set: 19719ms

    The search time for the last unique element is:
    vector-find: 297ms
    vector-sort-binary_search: 41344ms
    vector-binary_search: 0ms
    string: 234ms
    set: 0ms

    It seems that vector & binary_search is the best couple: the fastest way to fill-in the array and 0ms search time. Sorting slows things a lot. All in all - GREAT SOLUTION!

  14. #14
    Registered User
    Join Date
    Jul 2009
    Posts
    5
    I decided too quickly. I looked for more info on binary_search and it works only in sorted array. It worked in my case while the array was not sorted, but it's not stable solution.

    Since sorting takes a lot of time on large array, I think that std::set is the winner..
    Last edited by petry; 07-06-2009 at 03:41 PM.

  15. #15
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    I'm a little surprised the sort takes that long. I would think it would be closer to the set's load time. Perhaps it is because your data is already mostly sorted?

    My guess is that unordered_set will be the fastest of all the generic solutions, since lookup time and load time are all constant assuming your hash and capacity are good. Depending on your actual input you might be able to get a solution that is notably faster if you create your own structure/algorithm as brewbuck suggests, but I assumed that "Element1","Element2", etc, was just an example and your input isn't necessarily so specific.

Popular pages Recent additions subscribe to a feed