Thread: Optimizing my program

  1. #1
    Registered User
    Join Date
    Jan 2005
    Posts
    18

    Optimizing my program

    Hi, my program works fine but its not very efficient. It struggles with large files. I was pretty proud of myself that i managed to code it well,but is there anyway of making it better?

    this could help me become a better programmer...
    kind regards
    JoHN

    Code:
    #include <iostream> 
    #include <fstream> 
    #include <string> 
    #include <map> 
    #include <algorithm> 
    #include <vector> 
    using namespace std; 
    
    typedef vector<int> vect_int; 
    map<string,vect_int*> *ptMap;  // need global ptr to use in std::sort compare predicate. 
    string readFile(); 
    
    
    bool sort_map_greater( string item1, string item2) 
    { 
       return (*ptMap)[item1]->size()>(*ptMap)[item2]->size(); 
    } 
    
    
    int main( int argc, char *argv[])  // 
    { 
       // TODO:check that filename was given as command line argument here 
    
       char charbuf; 
       //string sequence; 
       //string inputfile(argv[1]); 
       map<string, vect_int*> repeats; 
       vector<string> index; // to be used for sorting 
    
       ptMap = &repeats; 
        
       string sequence(readFile()); 
    
       // TODO: add functionality to input get k 
       // get topmost 
       int k; 
       cout << "Enter k: "; 
       cin >> k; 
       //TODO: check valid input 
       cout << "Enter top n repeats to show: "; 
       int topmost; 
       cin >> topmost; 
       //TODO: check for valid input 
    
       for (int i = 0;i<sequence.size()-k;++i) 
       { 
          string sub = sequence.substr(i,k);//get substring based on k size; 
          if (repeats.find(sub)==repeats.end())// have we already mapped it? 
          { 
             //no, then add key with new vector 
             repeats[sub]=new vector<int>; 
             index.push_back(sub);//also add string to our index while we are at it. 
          } 
          repeats[sub]->push_back(i);// add position to hash 
       } 
    
       // ok now that we got them in 
       // need to sort them to get topmost 
    
       sort(index.begin(), index.end(),sort_map_greater);//std::sort makes things easy 
    
        
       //let's output the result 
       for (int i = 0; (i < topmost) && (i < index.size()); i++) 
       { 
          cout << "Number of repeats in order: " << index[i] << " with " << repeats[index[i]]->size() << endl; 
          for (vector<int>::iterator it = repeats[index[i]]->begin();it != repeats[index[i]]->end();it++) 
          { 
             cout << *it << " "; 
          } 
          cout << endl; 
       } 
    
    
       //finally we need to delete all those new vectors 
       for (map<string, vect_int*>::iterator mIt = repeats.begin(); mIt!= repeats.end();mIt++) 
       { 
          delete mIt->second; 
       } 
    }
    
    /*****************************************************************************/
    string readFile() {
    
      string filename;  
      int j;
      string information;
      char tmp = 0;        
      string   data("");   
    
    
    
      
      cout << "Enter filename: ";
      cin >> filename;
    
      ifstream infile(filename.c_str());
    
     for(j=0; j < 3; j++)
     {
       cout << "." << endl;
     }
    
     if( !filename.c_str()    )
       {
         cout << filename.c_str() << " has not been loaded " << endl;
         cin.get();
       }
     else
       {
         cout << filename.c_str() << " has been loaded" <<  endl;
       }
     
    while(!infile.eof()){
       infile.get(tmp);
       tmp= toupper(tmp);
        if ((tmp == 'A') || (tmp == 'T') || (tmp == 'C') || (tmp == 'G')) 
          {
          data += tmp;
        }
     }
     
     return data;
     
    }

  2. #2
    Registered User
    Join Date
    Feb 2005
    Location
    oslo
    Posts
    26
    this looks pointless?

    Code:
     for(j=0; j < 3; j++)
     {
       cout << "." << endl;
     }

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    RDTSC
    Sprinkle some of these around in 'begin-time' 'end-time' pairs, and see what is happening.

    Reading large files is in itself expensive, and there isn't a lot you can do to make disks spin any quicker.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Registered User
    Join Date
    Oct 2001
    Posts
    2,934
    Code:
    while(!infile.eof()){
       infile.get(tmp);
       tmp= toupper(tmp);
        if ((tmp == 'A') || (tmp == 'T') || (tmp == 'C') || (tmp == 'G')) 
          {
          data += tmp;
        }
     }
    It looks like this could be optimized, but as Salem suggested, first find out what section is taking time. If readFile() is slow, then that can be optimized. But without knowing what's taking time, it's a guessing game.

  5. #5
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    The only optimization you might want to look into is using the C standard library array-based I/O functions and then convert that array into a vector later. Other than that, there's not much you can do. Large files will take more processing power and there isn't much you can do really.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    data += tmp;
    Building a strings several million chars long, one character at a time is going to be expensive IMO.

    Do you know anything about the file, like is structured in handy rows of 80 characters perhaps?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    Jan 2005
    Posts
    18
    Quote Originally Posted by Salem
    data += tmp;
    Building a strings several million chars long, one character at a time is going to be expensive IMO.

    Do you know anything about the file, like is structured in handy rows of 80 characters perhaps?
    hey guys,thanks for some tips..

    the data looks like:
    [code]
    AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAA A
    AGAGTGTCTGATAGCAGC
    TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG TCACTAAATACTTTAACCAA
    TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCA TGAAACGCATTAGCACCACC
    ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGT ACAGGAAACACAGAAAAAAG
    CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTA ACAACCATGCGAGTGTTGAA
    GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCG ATATTCTGGAAAGCAATGCC
    AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAA CCACCTGGTGGCGATGATTG
    AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAA CGTATTTTTGCCGAACTTTT
    [\code]

    but thousands and thousands of lines below!

  8. #8
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Reading each line, validating the line and appending the line would be my first pass.

    As would rejecting the whole line if it contained any invalid characters.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Issue with program that's calling a function and has a loop
    By tigerfansince84 in forum C++ Programming
    Replies: 9
    Last Post: 11-12-2008, 01:38 PM
  2. Need help with a program, theres something in it for you
    By engstudent363 in forum C Programming
    Replies: 1
    Last Post: 02-29-2008, 01:41 PM
  3. Replies: 4
    Last Post: 02-21-2008, 10:39 AM
  4. My program, anyhelp
    By @licomb in forum C Programming
    Replies: 14
    Last Post: 08-14-2001, 10:04 PM