![]() |
| | #1 |
| Registered User Join Date: Jul 2009
Posts: 5
| I have a large number of elements to be stored into vector or basic string(delimited by \n). I need to search for an element and I would like to know what is faster: A: Code: std::string str = "\nElement1\nElement2...element1000\n"
str.find("\nElement3\n");
Code: std::vector<string> v;
v.push_back("Element1"); v.push_back("Element2"); ....
std::find(v.begin(), v.end(), "Element3") != v.end() )
Thanks a lot, Petry |
| petry is offline | |
| | #2 |
| Registered User Join Date: Oct 2008
Posts: 452
| The second is likely to be faster. If we start with \nElement1\nElement2...element1000\n and we search for Element3... For each position in the string we have to start comparing whether the string is equal to to the one we're searching for. That is, we have the number of characters time an equals-test (operator==). In the second case, we only have to compare at the start of each element. So in stead of one per character of the string we only have to the equals-test (operator==) the same amount of times as the number of elements, which is obviously a lot less. |
| EVOEx is offline | |
| | #3 |
| Student Join Date: Aug 2008 Location: UK -> Newcastle
Posts: 156
| With the first method, A, all you are doing is writing a new line and then a string of characters, which would be pretty tedious to find a certain character. The second however, is much more powerful. You are adding another string to the vector, which makes it a lot easier to manipulate.
__________________ MSDN <- Programmers Haven! |
| legit is offline | |
| | #4 |
| Registered User Join Date: Jul 2009
Posts: 5
| I have just tested the difference. Filesize: 8mb Lines: 700.000 Average line length: 15 characters Time to fill: vector: 3843ms string: 3891ms The search time for the last unique element is: vector: 218 string: 172 |
| petry is offline | |
| | #6 |
| Super Moderator Join Date: Aug 2001
Posts: 7,470
| Use a map. If you need to store items and then access them later via some criteria (IE: a key) a map would be better suited for this. If you want something quicker than a map then you could use the non-standard hash_map in MS STL or you could program your own hash container.
__________________ If you aim at everything you will hit something but you won't know what it is. Last edited by Bubba; 07-05-2009 at 10:31 AM. |
| Bubba is offline | |
| | #7 |
| Registered User Join Date: Jul 2009
Posts: 5
| I only to search through the elements, no need to access by key. |
| petry is offline | |
| | #8 |
| C++ Witch Join Date: Oct 2003 Location: Singapore
Posts: 10,352
| Then use std::set.
__________________ C + C++ Compiler: MinGW port of GCC Build + Version Control System: SCons + Bazaar Look up a C/C++ Reference and learn How To Ask Questions The Smart Way |
| laserlight is offline | |
| | #9 |
| Algorithm Dissector Join Date: Dec 2005 Location: New Zealand
Posts: 2,475
| Seconded. Just make sure you use the member version of find, not std::find and you're performance will be far better instead of a tad worse. Code: std::set<std::string> s;
s.insert("Element1"); s.insert("Element2"); ....
if ( s.find("Element3") != s.end() )
__________________ My homepage Advice: Take only as directed - If symptoms persist, please see your debugger |
| iMalc is offline | |
| | #10 |
| Registered User Join Date: Mar 2003 Location: Louisiana
Posts: 926
| However, a set has a few points of interests: You cannot store the same element twice. If you need do this use a multiset It will sort your contents. If you need to keep the ordering use unordered_map (it's in tr1 so if your compiler doesn't support it, use boost::tr1) |
| linuxdude is offline | |
| | #11 |
| Registered User Join Date: Jan 2005
Posts: 7,137
| >> If you need to keep the ordering use unordered_map unordered_map, or unordered_set will not keep the ordering. If you need to keep the ordering of how the items were added then vector is the best choice. The unordered containers are unordered. They aren't sorted but they don't retain the ordering of how you inserted the values (AFAIK). They might be the best option for this as they can have the fastest lookup times compared to the other choices, but you have to make sure you set up the hash properly. If you don't need to keep the elements in the order you added them, and don't want to use the unordered containers (or don't have access to them), there is another option that might be even better than std::set. That option is to use the vector, but before you search you sort the vector. Then use the binary_search algorithm instead of find. This duplicates the benefits of the binary tree that is how std::set is normally implemented, but adds less overhead because the items are stored in an array instead of linked in a tree. Last edited by Daved; 07-06-2009 at 11:54 AM. |
| Daved is offline | |
| | #12 |
| Senior software engineer Join Date: Mar 2007 Location: Portland, OR
Posts: 5,378
| The suggestion to use a std::set is a good one, but it is not the most efficient data structure for this problem. It is probable that many of the strings in your set are prefixes of each other. If this is the case, you should consider using a data structure which takes advantage of that fact. The naive implementation with std::set is inefficient if there are many similar strings with long, identical prefixes. The string comparison operation used during the set.find() operation will spend a lot of time comparing these prefixes. A tree data structure is still the right idea, but you should look at something like a ternary tree or a prefix tree. The lookup/insertion time is still O(log n), but with a far smaller constant factor. The memory usage is vastly improved as well.
__________________ "Congratulations on your purchase. To begin using your quantum computer, set the power switch to both off and on simultaneously." -- raftpeople@slashdot |
| brewbuck is offline | |
| | #13 | |
| Registered User Join Date: Jul 2009
Posts: 5
| Quote:
Filesize: 14mb Lines: 1.050.000 Average line length: 20 characters Time to fill: vector: 7610ms string: 8906ms set: 19719ms The search time for the last unique element is: vector-find: 297ms vector-sort-binary_search: 41344ms vector-binary_search: 0ms string: 234ms set: 0ms It seems that vector & binary_search is the best couple: the fastest way to fill-in the array and 0ms search time. Sorting slows things a lot. All in all - GREAT SOLUTION! | |
| petry is offline | |
| | #14 |
| Registered User Join Date: Jul 2009
Posts: 5
| I decided too quickly. I looked for more info on binary_search and it works only in sorted array. It worked in my case while the array was not sorted, but it's not stable solution. Since sorting takes a lot of time on large array, I think that std::set is the winner.. Last edited by petry; 07-06-2009 at 03:41 PM. |
| petry is offline | |
| | #15 |
| Registered User Join Date: Jan 2005
Posts: 7,137
| I'm a little surprised the sort takes that long. I would think it would be closer to the set's load time. Perhaps it is because your data is already mostly sorted? My guess is that unordered_set will be the fastest of all the generic solutions, since lookup time and load time are all constant assuming your hash and capacity are good. Depending on your actual input you might be able to get a solution that is notably faster if you create your own structure/algorithm as brewbuck suggests, but I assumed that "Element1","Element2", etc, was just an example and your input isn't necessarily so specific. |
| Daved is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|