Thread: choosing right sort algorithm

  1. #1
    Registered User Micko's Avatar
    Join Date
    Nov 2003
    Posts
    715

    choosing right sort algorithm

    Hello,
    I have a txt file with 5-6 thousands lines of data in format (e.g.)
    Text 12345 Text 12:34:56
    What would be your approach if you're asked to sort this lines with respect to second part (12345 in this examole). Is it good idea to create structures and then sort them and finaly rewrite file again. Maybe there is a clever way?
    Also what sort algorithm woud you choose. I woud try with insertion sort and heap sort. In general what algorithm would be most suited for this example in your opinion?
    Thanks
    Gotta love the "please fix this for me, but I'm not going to tell you which functions we're allowed to use" posts.
    It's like teaching people to walk by first breaking their legs - muppet teachers! - Salem

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    sort -k2n inputfile.txt > outputfile.txt
    On any decent UNIX command line ought to do the job.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Nov 2001
    Posts
    1,348
    map

    requires disk space for sorted data

    Kuphryn

  4. #4
    Registered User Micko's Avatar
    Join Date
    Nov 2003
    Posts
    715
    I don't have linux installed right now, so Salem's recommendations is not feasibile.I wrote program that reads lines, and store it in map container, and thet overwrite same file. If someone have better idea please suggest me.
    Gotta love the "please fix this for me, but I'm not going to tell you which functions we're allowed to use" posts.
    It's like teaching people to walk by first breaking their legs - muppet teachers! - Salem

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    *shrugs*
    cygwin
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User Micko's Avatar
    Join Date
    Nov 2003
    Posts
    715
    Okay Salem thanks, however I'm trying to make program myself.
    I now that map and multimap containers use internal balanced binary tree, and as such they're very good for sorting jobs.
    However map doesn't allow duplicates and both expect pair(key/value) and I have 4 values in one line of text.
    One of possible solutions is to use linked list but that would be a lof of work.
    Here's what I manage to do using two multimap containers:
    Code:
    #include <iostream>
    #include <string> 
    #include <fstream>
    #include <map>
    using namespace std;
    
    int main ( )
    {
    	multimap < string, string > cont;
    	multimap < string, string > cont2;
    	string first, second, third, fourth;
    	string value;
    	ifstream file ("Data.txt");
    	
    	while (file >> first >> second >> third >> fourth)
    	{
    		value = third + " " + fourth;
    		cont.insert (make_pair(second, first));
    		cont2.insert (make_pair(second, value));
    	}
    
    	multimap <string, string >::iterator it, it2;
    	ofstream out_file ("Res.txt");
    
    	for (it = cont.begin(), it2 = cont2.begin(); it != cont.end(); ++it, ++it2)
    	{
    		out_file << it->second << " " << it->first << " " <<it2 -> second<< endl;
    	}
    }
    Thanks for idea and here's code, I hope someone will find it useful.
    Last edited by Micko; 05-27-2006 at 01:03 PM. Reason: correction of using eof to control loop
    Gotta love the "please fix this for me, but I'm not going to tell you which functions we're allowed to use" posts.
    It's like teaching people to walk by first breaking their legs - muppet teachers! - Salem

  7. #7
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    I don't know how your data file is formatted exactly, but using eof() to control a while loop usually doesn't work very well. Check the FAQ. That said, the STL has lots of algorithms, you should try using std::sort()
    Last edited by whiteflags; 05-27-2006 at 11:38 AM.

  8. #8
    Registered User Micko's Avatar
    Join Date
    Nov 2003
    Posts
    715
    Thanks citizen, I forgot about it, now it is corrected. Why using sort:: std from algorithm library? Using maps/multimaps can be very powerfull since data are sorted immediately when inserting into container. Unless I'm very wrong this is usually faster than reading data int some kind of array of structures and then sorting it.
    Cheers

    - Micko
    Gotta love the "please fix this for me, but I'm not going to tell you which functions we're allowed to use" posts.
    It's like teaching people to walk by first breaking their legs - muppet teachers! - Salem

  9. #9
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    Well you asked what sort I would use, the answer is std::sort() though it seems that you found a better way of doing it. I read the tutorial from this site a while ago, but that left an impression on me that multimaps were a bad choice unless you wanted an associative array. It seems like the author of the tutorial was biased though, and possibly ignorant! I never read that maps sorted their contents, and to be honest, I never used them.

  10. #10
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    As Micko suggested, multimaps are optimized for both insertion and lookup - which is precisely why they are such a popular choice for situations like this.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  11. #11
    Registered User
    Join Date
    Mar 2006
    Posts
    725
    Well you asked what sort I would use, the answer is std::sort() though it seems that you found a better way of doing it.
    To me maps would be better, as std::sort relies on abstraction through iterators, which can be prohibitively computationally expensive. Don't trust everything you read

    Both can be used to sort in O(n log n) by the way.
    Code:
    #include <stdio.h>
    
    void J(char*a){int f,i=0,c='1';for(;a[i]!='0';++i)if(i==81){
    puts(a);return;}for(;c<='9';++c){for(f=0;f<9;++f)if(a[i-i%27+i%9
    /3*3+f/3*9+f%3]==c||a[i%9+f*9]==c||a[i-i%9+f]==c)goto e;a[i]=c;J(a);a[i]
    ='0';e:;}}int main(int c,char**v){int t=0;if(c>1){for(;v[1][
    t];++t);if(t==81){J(v[1]);return 0;}}puts("sudoku [0-9]{81}");return 1;}

  12. #12
    Registered User Micko's Avatar
    Join Date
    Nov 2003
    Posts
    715
    Well, std:: sort is from algorithm library and it is written so it can be applied on very lagre group of containers, so choosing right container is very important. kuphryn suggested maps/multimaps and I liked the idea and that's it.

    - Micko
    Gotta love the "please fix this for me, but I'm not going to tell you which functions we're allowed to use" posts.
    It's like teaching people to walk by first breaking their legs - muppet teachers! - Salem

  13. #13
    Registered User
    Join Date
    Aug 2005
    Posts
    1,267
    for large groups of data, wouldn't it be faster to just put the strings into a vector then use std::sort to sort them? Isn't multimap pretty slow at inserting new strings? I haven't timed the difference, so I really don't know, but since multimap (or just map) has to search for the right place to insert a new string I would think std::sort would be a lot faster.

  14. #14
    Registered User Micko's Avatar
    Join Date
    Nov 2003
    Posts
    715
    In my opinion vector container wouldn't be good choice because of practical reasons. Sort algorithm would move and swap elements, and multimaps preserve sorted structures. If you add new lines later, those lines would come to the right place in balanced binary tree, and vector would needed swapping again.
    That's only my opinion, I didn't measured time to check. Maybe some kind of profiler would be helpful here.
    Gotta love the "please fix this for me, but I'm not going to tell you which functions we're allowed to use" posts.
    It's like teaching people to walk by first breaking their legs - muppet teachers! - Salem

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > If you add new lines later, those lines would come to the right place in balanced binary tree
    True, but keeping a tree balanced, and finding the right place are not free actions.

    > As Micko suggested, multimaps are optimized for both insertion and lookup
    But this application is a single insert and a single sequential lookup. You don't see the benefit of many random accesses (which would obviously suck on a vector).

    But then again, the original example sorted on the second field of a line, which would need a more complex comparison function in the case of a vector sort.

    Here's my simple test - the input file was the bash manual page.
    Code:
    #include <iostream>
    #include <string>
    #include <fstream>
    #include <map>
    #include <algorithm>
    #include <vector>
    #include <ctime>
    using namespace std;
    
    // use a map
    void test1 ( const char *inFile, const char *outFile ) {
      multimap < string, string > cont;
      string value;
    
      ifstream file (inFile);
      while (file >> value) {
        cont.insert (make_pair(value, value));
      }
    
      multimap <string, string >::iterator it;
      ofstream out_file (outFile);
      for (it = cont.begin(); it != cont.end(); ++it) {
        out_file << it->second << endl;
      }
    }
    
    // use a vector
    void test2 ( const char *inFile, const char *outFile ) {
      vector < string > cont;
      string value;
    
      ifstream file (inFile);
      while (file >> value) {
        cont.push_back (value);
      }
    
      sort( cont.begin(), cont.end() );
      vector < string >::iterator it;
      ofstream out_file (outFile);
      for (it = cont.begin(); it != cont.end(); ++it) {
        out_file << *it << endl;
      }
    }
    
    // just the I/O overhead
    void test3 ( const char *inFile, const char *outFile ) {
      string value;
    
      ifstream file (inFile);
      ofstream out_file (outFile);
      while (file >> value) {
        out_file << value << endl;
      }
    }
    
    int main ( ) {
      clock_t t1, t2, t3, t4;
      t1 = clock();
      test1( "tmp.txt", "tmp1.txt" );
      t2 = clock();
      test2( "tmp.txt", "tmp2.txt" );
      t3 = clock();
      test3( "tmp.txt", "tmp3.txt" );
      t4 = clock();
      cout << "Test1 " << t2 - t1 << endl;
      cout << "Test2 " << t3 - t2 << endl;
      cout << "Test3 " << t4 - t3 << endl;
      return 0;
    }
    
    
    $ g++ -W -Wall -ansi -pedantic -O2 foo.cpp
    $ ./a.out
    Test1 470000
    Test2 420000
    Test3 210000
    $ ./a.out
    Test1 650000
    Test2 510000
    Test3 280000
    $ ./a.out
    Test1 640000
    Test2 500000
    Test3 280000
    $ ./a.out
    Test1 550000
    Test2 490000
    Test3 280000
    $ ./a.out
    Test1 590000
    Test2 500000
    Test3 270000
    $ wc tmp*.txt
     36251  36251 215615 tmp1.txt
     36251  36251 215615 tmp2.txt
     36251  36251 215615 tmp3.txt
      4793  36251 256697 tmp.txt
    113546 145004 903542 total
    $ diff tmp1.txt tmp2.txt
    $
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. JavaScript like sort algorithm?
    By audinue in forum C Programming
    Replies: 2
    Last Post: 07-28-2008, 12:27 AM
  2. help with debug (bubble sort algorithm)
    By lbraglia in forum C Programming
    Replies: 5
    Last Post: 10-19-2007, 05:24 PM
  3. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  4. Replies: 2
    Last Post: 05-06-2003, 08:34 AM
  5. Sort algorithm and an ugly error
    By Trauts in forum C++ Programming
    Replies: 7
    Last Post: 02-25-2003, 12:36 PM