Thread: std::map with many elements

  1. #16
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    Ok, that's a good start. Now put timing data around that loop to see if the slow down is in the loop. In fact, just a cout before and after the outer while loop might do it. Remember, I'm thinking one potential problem is in the destruction of a large map, so commenting out the code that fills the large map would get rid of that slowdown as well.

  2. #17
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220
    Oh, i am sorry, i wasn´t clear:
    Total time execution with insertion: 2.08s
    Loop time with insertion: 2.06s

    Total time execution without insertion: 0.41s
    Loop time without insertion: 0.41s

  3. #18
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by Scarvenger View Post
    okay, with this piece of code:
    Code:
    		while (begin != end)
    		{
    			dictionary[(*begin++)]++;
    		}
    The routine took 2.05 seconds, and without it (commenting it) i get 0.42 seconds...
    What timing do you get if you just change that loop to this?:
    Code:
    		while (begin != end)
    		{
    			*begin++;
    		}
    Then tell us how long it takes if you instead do this:
    Code:
    		std::map<std::string, int> iter = dictionary.begin();
    		while (iter != dictionary.end())
    		{
    			dictionary[iter++]++;
    		}
    That should tell us whether the slowness is mainly related to incrementing the sregex_token_iterator, or whether it is in fact the actual insertions.
    Last edited by iMalc; 12-01-2007 at 06:26 PM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #19
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220
    Code:
                    while (begin != end)
    		{
    			dictionary[(*begin++)]++;
    		}
    0.951s

    Code:
    		while (begin != end)
    		{
    			*begin++;
    		}
    0.781s

    So, yes, now i suppose that the problem is in the sregex_token_iterator... Maybe i will think anything to replace it...

  5. #20
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220
    Okay, i have changed the whole code that used the boost::regex library to a pure std approach, the code is the following:
    Code:
    	ifstream file(filename);
    	char data[3000];
    
    	while (!file.eof())
    	{
    		file.getline(data, 3000);
    		string line(data);
    
    		stringToLower(line);
    
    		unsigned int position = 0;
    		while ((position = line.find_first_of(delimiters, position)) != -1)
    		{
    			int endPosition = line.find_first_not_of(delimiters, position);
    			string result      = line.substr(position, endPosition - position);
    			position             = endPosition;
    
    			dictionary[result]++;
    		}
    	}
    
    	return true;
    This seems to be a much faster aproach than the boost::regex since it gives me total execution time of ~1.39s while the boost::regex was giving me ~4.01s.

    Now i would like to squeeze this program to get the last drop of performance from it... Suggestions?
    [edit]
    Readability and good practices sugestion are welcome too
    [/edit]
    Last edited by Scarvenger; 12-01-2007 at 10:20 PM.

  6. #21
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    >> Now i would like to squeeze this program to get the last drop of performance from it... Suggestions?

    Just a guess: put line.substr(position, endPosition - position) into the dictionary[] call instead of result and remove result entirely. Also you'd have to move the assignment of position to after that call as well. A good optimizing compiler probably would already have done that itself, but it's worth a try.

    By the way, are the timings on a release build with optimizations on? That makes a big difference. All timing tests should be done with release builds.


    >> Readability and good practices sugestion are welcome too

    I'd use string::npos instead of -1 and string::size_type instead of unsigned int.

    Why are you using the C-string version of getline? Are you purposefully trying to cap the input at 3000 characters? If so, that's fine. If you made it 3000 because you want to make sure you get the entire line, then use the C++ string version of getline: getline(file, line).

    Whichever version of getline you use, you should not use eof() to control the loop. You need to check to see if the reading function (getline in this case) succeeds before using it. The most common and easy way to do this is to move the reading function into the while control:
    Code:
    string line;
    while (getline(file, line))
    {
        // ...

  7. #22
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    To get more performance you'd need a hash table instead of the balanced tree. See if your compiler provides std::tr1::unordered_map, or perhaps stdext::hash_map.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #23
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    I would prefer boost's unordered_map implementation before using stdext::hash_map. The standard hash_map has pretty much been set now, so it only makes sense to start using that instead of the older compiler specific variations.

    Also remember what I mentioned earlier about knowing a good size estimate before you start reading. If you don't have one, then you might either waste a ton of memory or lose all the performance benefits of the hash map (or worse have significantly worse performance). A good size to use is 50&#37; more than how many elements will be in the map. Making that number prime or with few factors might help as well.

  9. #24
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Except that Boost doesn't have an unordered_map implementation in its TR1 emulation. The only hash map in Boost is a multi_index container with a hash index.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  10. #25
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    Really? I could have sworn I'd used or at least read about their implementation before. I guess I confused that with something else.

    That's too bad. It took long enough to get a standardized interface that it would be nice if there was a cross-platform implementation available that implements it.

  11. #26
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Or there is talk about their implementation. In fact, there is an implementation - but it's in the Sandbox, not yet part of Boost proper. 1.34 doesn't have it, 1.35 won't have it either.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. using realloc for a dynamically growing array
    By broli86 in forum C Programming
    Replies: 10
    Last Post: 06-27-2008, 05:37 AM
  2. Randomly rearranging the elements in a vector
    By Signifier in forum C++ Programming
    Replies: 11
    Last Post: 08-01-2007, 12:21 PM
  3. delete[]'ing std::map elements... sort of.
    By cboard_member in forum C++ Programming
    Replies: 2
    Last Post: 07-12-2006, 12:21 PM
  4. way to check 3 elements of array and set 4th
    By Syneris in forum C++ Programming
    Replies: 3
    Last Post: 01-09-2006, 11:30 AM
  5. Replies: 2
    Last Post: 08-03-2003, 10:01 AM