C Board  

Go Back   C Board > General Programming Boards > C++ Programming

Reply
 
LinkBack Thread Tools Display Modes
Old 07-05-2009, 02:17 AM   #1
Registered User
 
Join Date: Jul 2009
Posts: 5
Question std::string::find vs std::find

Hi All:

I have a large number of elements to be stored into vector or basic string(delimited by \n). I need to search for an element and I would like to know what is faster:

A:
Code:
std::string str = "\nElement1\nElement2...element1000\n"
str.find("\nElement3\n");
B:
Code:
std::vector<string> v;
v.push_back("Element1"); v.push_back("Element2");  ....
std::find(v.begin(), v.end(), "Element3") != v.end() )
I'm only interested in search time, I don't need to access any of the elements for reading, modifying or removing.

Thanks a lot,
Petry
petry is offline   Reply With Quote
Old 07-05-2009, 04:17 AM   #2
Registered User
 
Join Date: Oct 2008
Posts: 452
The second is likely to be faster.

If we start with \nElement1\nElement2...element1000\n and we search for Element3... For each position in the string we have to start comparing whether the string is equal to to the one we're searching for. That is, we have the number of characters time an equals-test (operator==).

In the second case, we only have to compare at the start of each element. So in stead of one per character of the string we only have to the equals-test (operator==) the same amount of times as the number of elements, which is obviously a lot less.
EVOEx is offline   Reply With Quote
Old 07-05-2009, 04:50 AM   #3
Student
 
legit's Avatar
 
Join Date: Aug 2008
Location: UK -> Newcastle
Posts: 156
With the first method, A, all you are doing is writing a new line and then a string of characters, which would be pretty tedious to find a certain character. The second however, is much more powerful. You are adding another string to the vector, which makes it a lot easier to manipulate.
__________________
MSDN <- Programmers Haven!
legit is offline   Reply With Quote
Old 07-05-2009, 07:50 AM   #4
Registered User
 
Join Date: Jul 2009
Posts: 5
I have just tested the difference.

Filesize: 8mb
Lines: 700.000
Average line length: 15 characters

Time to fill:
vector: 3843ms
string: 3891ms

The search time for the last unique element is:
vector: 218
string: 172
petry is offline   Reply With Quote
Old 07-05-2009, 07:58 AM   #5
Student
 
legit's Avatar
 
Join Date: Aug 2008
Location: UK -> Newcastle
Posts: 156
Wow, I wasn't expecting that.
__________________
MSDN <- Programmers Haven!
legit is offline   Reply With Quote
Old 07-05-2009, 10:02 AM   #6
Super Moderator
 
Bubba's Avatar
 
Join Date: Aug 2001
Posts: 7,470
Use a map.

If you need to store items and then access them later via some criteria (IE: a key) a map would be better suited for this. If you want something quicker than a map then you could use the non-standard hash_map in MS STL or you could program your own hash container.
__________________
If you aim at everything you will hit something but you won't know what it is.

Last edited by Bubba; 07-05-2009 at 10:31 AM.
Bubba is offline   Reply With Quote
Old 07-05-2009, 12:31 PM   #7
Registered User
 
Join Date: Jul 2009
Posts: 5
I only to search through the elements, no need to access by key.
petry is offline   Reply With Quote
Old 07-05-2009, 12:34 PM   #8
C++ Witch
 
laserlight's Avatar
 
Join Date: Oct 2003
Location: Singapore
Posts: 10,352
Then use std::set.
__________________
C + C++ Compiler: MinGW port of GCC
Build + Version Control System: SCons + Bazaar

Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
laserlight is offline   Reply With Quote
Old 07-06-2009, 12:18 AM   #9
Algorithm Dissector
 
iMalc's Avatar
 
Join Date: Dec 2005
Location: New Zealand
Posts: 2,475
Quote:
Originally Posted by laserlight View Post
Then use std::set.
Seconded.
Just make sure you use the member version of find, not std::find and you're performance will be far better instead of a tad worse.
Code:
std::set<std::string> s;
s.insert("Element1"); s.insert("Element2");  ....
if ( s.find("Element3") != s.end() )
__________________
My homepage
Advice: Take only as directed - If symptoms persist, please see your debugger
iMalc is offline   Reply With Quote
Old 07-06-2009, 11:26 AM   #10
Registered User
 
linuxdude's Avatar
 
Join Date: Mar 2003
Location: Louisiana
Posts: 926
However, a set has a few points of interests:
You cannot store the same element twice. If you need do this use a multiset
It will sort your contents. If you need to keep the ordering use unordered_map (it's in tr1 so if your compiler doesn't support it, use boost::tr1)
linuxdude is offline   Reply With Quote
Old 07-06-2009, 11:50 AM   #11
Registered User
 
Join Date: Jan 2005
Posts: 7,137
>> If you need to keep the ordering use unordered_map
unordered_map, or unordered_set will not keep the ordering. If you need to keep the ordering of how the items were added then vector is the best choice.

The unordered containers are unordered. They aren't sorted but they don't retain the ordering of how you inserted the values (AFAIK). They might be the best option for this as they can have the fastest lookup times compared to the other choices, but you have to make sure you set up the hash properly.

If you don't need to keep the elements in the order you added them, and don't want to use the unordered containers (or don't have access to them), there is another option that might be even better than std::set. That option is to use the vector, but before you search you sort the vector. Then use the binary_search algorithm instead of find. This duplicates the benefits of the binary tree that is how std::set is normally implemented, but adds less overhead because the items are stored in an array instead of linked in a tree.

Last edited by Daved; 07-06-2009 at 11:54 AM.
Daved is offline   Reply With Quote
Old 07-06-2009, 12:57 PM   #12
Senior software engineer
 
brewbuck's Avatar
 
Join Date: Mar 2007
Location: Portland, OR
Posts: 5,378
The suggestion to use a std::set is a good one, but it is not the most efficient data structure for this problem. It is probable that many of the strings in your set are prefixes of each other. If this is the case, you should consider using a data structure which takes advantage of that fact.

The naive implementation with std::set is inefficient if there are many similar strings with long, identical prefixes. The string comparison operation used during the set.find() operation will spend a lot of time comparing these prefixes. A tree data structure is still the right idea, but you should look at something like a ternary tree or a prefix tree.

The lookup/insertion time is still O(log n), but with a far smaller constant factor. The memory usage is vastly improved as well.
__________________
"Congratulations on your purchase. To begin using your quantum computer, set the power switch to both off and on simultaneously." -- raftpeople@slashdot
brewbuck is offline   Reply With Quote
Old 07-06-2009, 02:46 PM   #13
Registered User
 
Join Date: Jul 2009
Posts: 5
Quote:
Originally Posted by Daved View Post
>>That option is to use the vector, but before you search you sort the vector. Then use the binary_search algorithm instead of find. This duplicates the benefits of the binary tree that is how std::set is normally implemented, but adds less overhead because the items are stored in an array instead of linked in a tree.
I only need to check if an element exists no need to keep the order. I made a new test including set and Daved's suggestion, the results are below:

Filesize: 14mb
Lines: 1.050.000
Average line length: 20 characters

Time to fill:
vector: 7610ms
string: 8906ms
set: 19719ms

The search time for the last unique element is:
vector-find: 297ms
vector-sort-binary_search: 41344ms
vector-binary_search: 0ms
string: 234ms
set: 0ms

It seems that vector & binary_search is the best couple: the fastest way to fill-in the array and 0ms search time. Sorting slows things a lot. All in all - GREAT SOLUTION!
petry is offline   Reply With Quote
Old 07-06-2009, 03:02 PM   #14
Registered User
 
Join Date: Jul 2009
Posts: 5
I decided too quickly. I looked for more info on binary_search and it works only in sorted array. It worked in my case while the array was not sorted, but it's not stable solution.

Since sorting takes a lot of time on large array, I think that std::set is the winner..

Last edited by petry; 07-06-2009 at 03:41 PM.
petry is offline   Reply With Quote
Old 07-06-2009, 03:09 PM   #15
Registered User
 
Join Date: Jan 2005
Posts: 7,137
I'm a little surprised the sort takes that long. I would think it would be closer to the set's load time. Perhaps it is because your data is already mostly sorted?

My guess is that unordered_set will be the fastest of all the generic solutions, since lookup time and load time are all constant assuming your hash and capacity are good. Depending on your actual input you might be able to get a solution that is notably faster if you create your own structure/algorithm as brewbuck suggests, but I assumed that "Element1","Element2", etc, was just an example and your input isn't necessarily so specific.
Daved is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Forum Jump


All times are GMT -6. The time now is 05:31 PM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.0 RC2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22