Thread: std::map with many elements

  1. #1
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220

    std::map with many elements

    Hi, i am using std::map for many elements almost 20.000. But the insertion of those elements seems to be a litlle bit slow... Is there anything like std::string::reserve method on map? I already looked the references but havenīt found any hint. Ideas?

  2. #2
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    maps are implemented as binary trees, so you can't really reserve memory like you can for something implemented with an array. The reason reserve is used for strings and vectors is to avoid copying of the data over and over as it is being appended. There isn't any copying with a map because the existing elements never have to be moved when a new element is added.

    If you think the slowdown is caused by fragmented memory (which has happened to me with a map), then you can adjust the allocator the map uses to allocate it's memory. I've used the Pool allocator from boost for this in the past. However, the slowdown we noticed was in deletion of the map, not insertion.

    What makes you think the insertion is the source of the slowdown?

  3. #3
    Registered User
    Join Date
    Nov 2006
    Posts
    519
    the insertion speed for a map depends on the number of elements which are already in the map so it's always slow compared to a list for example. thats the price to get the elements sorted. the only way to avoid this is using another type of container which allows insertion in constant time. If you describe the purpose of your map and the operations you run on it (what and how often) maybe a better alternative can be suggested.
    Last edited by pheres; 12-01-2007 at 04:27 AM.

  4. #4
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    But 20000 elements is nothing. That's at worst 15 comparisons. So unless your comparison function is really slow (what are you storing?), it shouldn't be noticeably slow.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  5. #5
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220
    I haven´t defined a comparison function... Ok, the purpose of the map is to store a indefined number of words 20.000 ~ 500.000 and count the times that that word appears on a text. So i was using map to store words like: mymap["the"]++;. And i noticed the slowdown on my routine of getting the words from the text since i have added the insertion routine to my map...

  6. #6
    The larch
    Join Date
    May 2006
    Posts
    3,573
    May-be you could post the code / insertion routine?
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  7. #7
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220
    Sure, it have some code that uses boost too:

    Code:
    std::map<std::string, int> dictionary;
    boost::regex expression("([a-z]+)", regex::perl|regex::icase);
    ...
    Code:
    	ifstream file(filename);
    	cmatch result;
    	char data[10000];
    
    	while (!file.eof())
    	{
    		file.getline(data, 10000);
    
    		string line(data);
    
    		sregex_token_iterator begin(line.begin(), line.end(), expression, 1);
    		sregex_token_iterator end;
    
    		while (begin != end)
    		{
    			dictionary[(*begin++)]++;
    		}
    	}

  8. #8
    Registered User
    Join Date
    Aug 2005
    Location
    Austria
    Posts
    1,990
    I'v never used boost::regexp but
    Code:
    		sregex_token_iterator begin(line.begin(), line.end(), expression, 1);
    		sregex_token_iterator end;
    
    		while (begin != end)
    end seems to be uninitialized.
    Kurt

  9. #9
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220
    Oh, that&#180;s the way it should be used ;P

  10. #10
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    So you're saying that if you take out the dictionary[(*begin++)]++; line it works much faster? How much faster?

    Perhaps you can use a profiler to see exactly where the slowdown is. A decent one should show you the call even if it comes from the map class. There are free profilers available if you don't want to spend money.

    Or you can add timers to the code to make sure the problem is there and not somewhere else. For example, if you're entire program just takes longer when you add that line, it still might be a memory fragmentation problem that occurs when the map is destroyed, rather than exactly at that line of code. The memory fragmentation issue has a solution.

    BTW, because you're potentially mixing inserts and lookups, I don't think any another container makes sense over a map. If you don't need these things sorted then perhaps a hash based map would be better (use unordered_map from std::tr1 or boost). The catch with that is that you need a pretty reasonable estimate of how many entries you'll have before you start or a maximum size that you can use every time.

  11. #11
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220
    Can you indicate any win32 profiler? I am unable to use my gprof right now...

  12. #12
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    I honestly don't know any off the top of my head without doing a full search myself.

  13. #13
    Registered User
    Join Date
    Nov 2006
    Posts
    519
    there is a free version of VTune from intel

  14. #14
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by pheres View Post
    there is a free version of VTune from intel
    Which doesn't work on AMD processors, but AMD has a similar tool called CodeAnalyst [in fact, the latter can do a lot more, because you also get a instruction simulator that can analyze EXACTLY what happens in any piece of your code - it just takes about 10-100x more time to run the code, but you only do that once you have figured out which bits are really you bottleneck, and you only do it enough to find the answer to why it's slow (unless you already know by the time you get to that piece of code!)].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  15. #15
    Registered User
    Join Date
    Oct 2005
    Location
    Brasil
    Posts
    220
    okay, with this piece of code:
    Code:
    		while (begin != end)
    		{
    			dictionary[(*begin++)]++;
    		}
    The routine took 2.05 seconds, and without it (commenting it) i get 0.42 seconds...

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. using realloc for a dynamically growing array
    By broli86 in forum C Programming
    Replies: 10
    Last Post: 06-27-2008, 05:37 AM
  2. Randomly rearranging the elements in a vector
    By Signifier in forum C++ Programming
    Replies: 11
    Last Post: 08-01-2007, 12:21 PM
  3. delete[]'ing std::map elements... sort of.
    By cboard_member in forum C++ Programming
    Replies: 2
    Last Post: 07-12-2006, 12:21 PM
  4. way to check 3 elements of array and set 4th
    By Syneris in forum C++ Programming
    Replies: 3
    Last Post: 01-09-2006, 11:30 AM
  5. Replies: 2
    Last Post: 08-03-2003, 10:01 AM