Hi,
I am developing an indexer for a search engine. As a part of that I have to create an inverted index over a large collection.
I am using a nested map structure.
Outer_Map < String, Object1>
Class Object1 {
Inner Map<int docid,long freq>
}
I have attached the code below.
A few notes about the code
When a word is found by the indexer it checks if the word is already present in reverseDict ....if so we go the object in the value field then increase the freq (value)counter in the inner map by finding the corres docid (passed as filename)
reverseDict is the main outer map.
token_post_list(tpl) is an object that has the inner map for each entry in the main outer map(reverseDict).
Code:
void buildIndex{
long temp_tf;
PostingMap map_posting;
PostingMap::iterator inner_iter;
iter=reverseDict.find(str);
if(iter==reverseDict.end()){
token_post_list tpl;
map_posting.insert(pair<string, long>(filename,1));
tpl.setPostingMap(map_posting);
reverseDict.insert(pair<string, token_post_list> (str,tpl));
}else{
map_posting=iter->second.getPostingMap(); // tpl->getPostingMap();
inner_iter=map_posting.find( filename );
if(inner_iter == map_posting.end()){
//word has not been found - insert now
map_posting.insert(pair<string, long>(filename,1));
}else{
temp_tf=inner_iter->second;
temp_tf++;
inner_iter->second=temp_tf;
//cout<<"indexer.cpp:: Repeat occurance tf value :"<<inner_iter->second<<endl;
}
iter->second.setPostingMap(map_posting);
}
I find the code based on this to be extremely slow.
Can you please point out what is slowing this code so enormously. I have seen some pretty fast implementation with the data-structure combination in others.