Thread: Search Engine Speed

  1. #1
    Registered User
    Join Date
    Sep 2007
    Posts
    6

    Search Engine Speed

    Hi,

    I am developing an indexer for a search engine. As a part of that I have to create an inverted index over a large collection.

    I am using a nested map structure.

    Outer_Map < String, Object1>
    Class Object1 {
    Inner Map<int docid,long freq>
    }

    I have attached the code below.

    A few notes about the code

    When a word is found by the indexer it checks if the word is already present in reverseDict ....if so we go the object in the value field then increase the freq (value)counter in the inner map by finding the corres docid (passed as filename)

    reverseDict is the main outer map.
    token_post_list(tpl) is an object that has the inner map for each entry in the main outer map(reverseDict).



    Code:
    void buildIndex{
            long temp_tf;
            PostingMap map_posting;
            PostingMap::iterator inner_iter;
    
            iter=reverseDict.find(str);
            if(iter==reverseDict.end()){
                token_post_list tpl;
                map_posting.insert(pair<string, long>(filename,1));
                tpl.setPostingMap(map_posting);
                reverseDict.insert(pair<string, token_post_list> (str,tpl));
            }else{
                map_posting=iter->second.getPostingMap();   // tpl->getPostingMap();
    
                inner_iter=map_posting.find( filename );
                if(inner_iter == map_posting.end()){
                    //word has not been found - insert now
                    map_posting.insert(pair<string, long>(filename,1));
                }else{
                    temp_tf=inner_iter->second;
                    temp_tf++;
                    inner_iter->second=temp_tf;
                    //cout<<"indexer.cpp:: Repeat occurance  tf value :"<<inner_iter->second<<endl;
    
                }
                iter->second.setPostingMap(map_posting);
    
            }

    I find the code based on this to be extremely slow.

    Can you please point out what is slowing this code so enormously. I have seen some pretty fast implementation with the data-structure combination in others.

  2. #2
    The larch
    Join Date
    May 2006
    Posts
    3,573
    Well, it's not probably going to be all that fast but are you sure you are not, for example, making unnecessary copies of large objects?

    For example: this looks suspicious
    Code:
    PostingMap map_posting;
    ...
    map_posting=iter->second.getPostingMap();
    ...
    iter->second.setPostingMap(map_posting);
    If those indeed involve copies, may-be you could provide PostingMap with a member function that takes the filename and handles the operation internally.

    By the way, if I'm not mistaken you can increment the value at a key and insert a new key with value 1 with a single map operation (the value is initialized to 0 in this case and immediately incremented):
    Code:
    map_posting[file]++;
    This should probably search for the right place in the map only once, instead of searching once for find and searching again for insert.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Another code search engine
    By Mario F. in forum Tech Board
    Replies: 0
    Last Post: 12-02-2006, 08:45 AM
  2. ....does anyone know of some Search Engine...
    By GGrrl in forum C++ Programming
    Replies: 4
    Last Post: 03-25-2003, 08:58 AM
  3. Contest Results - May 27, 2002
    By ygfperson in forum A Brief History of Cprogramming.com
    Replies: 18
    Last Post: 06-18-2002, 01:27 PM
  4. Search Engines :: Winsock
    By kuphryn in forum Windows Programming
    Replies: 3
    Last Post: 06-18-2002, 12:31 PM
  5. Sorting strings to speed up search?
    By Mox in forum C++ Programming
    Replies: 5
    Last Post: 09-10-2001, 12:17 PM