Thread: Well and truly sorted

  1. #1
    Registered User intrepid_is's Avatar
    Join Date
    Nov 2006
    Location
    London
    Posts
    1

    Lightbulb Well and truly sorted

    Hi there,

    I have created a sorting algorithm that is better than Quick Sort and even competes with Introspective Sort, but is constructed in a much simpler way.

    Please could you scrutinise the code, it is rather simple and shouldn't take long to look through. I would like to know what people think of it. I have freely released the algorithm for anyone use and develop further.

    Follow this link to download it:
    Click Here

    Thanks.


    Chris Nash

  2. #2
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318

    Red face

    I have alrady implemented that sorting algorithm. I call it 'mean sort' because it gets its splitter from the mean of the highest and lowest numbers.

    This algorithm is still O(n*n) worst case, but only when the input data is logarithmically distributed. Such an algorithm is also slightly less general than quicksort because it involves adding and dividing keys rather than just comparing. Note that this same technique can also be used for list sorting, which I have also implemented.

    My implementation is much more efficient as I have been studying sorting algorithms for a long time and have optimised it well. Your code also has two recursive calls rather than processing the larger half by iteration. Therefore it is subject to stack overflow with pathelogical input cases.
    My implementation also comes out shorter, although that could be due to fewer comments. I also use a final insertion sort pass to sort the almost sorted array.
    I suggest not bothering with the special case for sorting two items you'll find that is does not really improve the speed at all.
    Last edited by iMalc; 11-11-2006 at 05:27 PM.

  3. #3
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    Wow, iMalc. I thought that I was egotistical, but you make me look like buddha with that post.

    >My implementation is much more efficient as I have been studying sorting algorithms for a long time and have optimised it well.
    Your reasoning has no bearing at all on your conclusion. In fact, your logic is fallacious. I'd much rather see concrete tests that prove your claim rather than a whole lot of "holier than thou" hot air.
    My best code is written with the delete key.

  4. #4
    MFC killed my cat! manutd's Avatar
    Join Date
    Sep 2006
    Location
    Boston, Massachusetts
    Posts
    870
    Honestly, iMalc, these posts are supposed to be helpful and encouraging, not "your attempt fails as obviously mine is better".
    Silence is better than unmeaning words.
    - Pythagoras
    My blog

  5. #5
    Its hard... But im here swgh's Avatar
    Join Date
    Apr 2005
    Location
    England
    Posts
    1,688
    I guess he got the wrong idea when answering
    Double Helix STL

  6. #6
    MFC killed my cat! manutd's Avatar
    Join Date
    Sep 2006
    Location
    Boston, Massachusetts
    Posts
    870
    Yeah, you could say that.
    Silence is better than unmeaning words.
    - Pythagoras
    My blog

  7. #7
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Oh whoops I can certainly see how one might take it all the wrong way. I was being helpful. Well done for discovering the algorithm!

    No it wasn't my ego talking, I actually happen to really know a lot about sorting, as should be clear from the information provided. I merely state the facts, you'll get over my personality soon enough and realise I mean no disrespect.

    It is worth noting that introspective sort is in no way faster than quicksort with final-insertion sort. It's purpose is simply to prevent O(n*n) cases, and to do so actually slows the algorithm down slightly most of the time.

    I'm happy to point anyone towards more information on the subject.

  8. #8
    Algorithm engineer
    Join Date
    Jun 2006
    Posts
    286
    Nice algorithm! d-(~_O)

    Still worst case time O(n^2) though; time depending on the distribution of the numbers rather that on their order.
    For logarithmic scales average time is O(n^2).
    Last edited by TriKri; 11-11-2006 at 07:11 PM.

  9. #9
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    What makes me curious though is people still search for generic sorting algorithms. My question is, when was the last generic wildly accepted sorting algorithm invented?
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  10. #10
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by Mario F.
    What makes me curious though is people still search for generic sorting algorithms. My question is, when was the last generic wildly accepted sorting algorithm invented?
    That depends on your definition of "widely accepted".
    Library-Sort was invented in 2004, and Flash-Sort, in 1998.
    I myself have invented a very good modification to list-based-quicksort that I have not yet published.

  11. #11
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Hmm, I now have time for some closer scrutiny...

    Your partitioning inner loop seems pretty good as does your insertion sort inner loop. Both seem to be about as optimal as mine. I do use a smaller value for final-insertion sort cuttoff value btw, 16 vs intrepid_is' 25.

    I have now timed both algorithms for sorting 10 million evenly distributed integers in a randomly shuffled order. (Yes it was release build and I know how to ensure that nothing is optimised out) I also repeated the runs many times and took the lowest timings of all runs.
    The results of both sorts were confirmed correctly sorted. my version = 1838ms intrepid_is' version = 1846ms. I'm really impressed! There is almost nothing in it! I think I was overly critical earlier.

    In my algorithm I have included the code which calculates the min and max as I consider that to be part of the algorithm. In yours you have made it the responsibility of the caller to provide these correct values. That does mean that the caller it more tightly coupled with the algorithm, but can mean recalculation of min and max can be avoided sometimes. To be fair, this code was included in the timing btw. Actualy perhaps our algorithms should allow the caller to specify the min and max, or let the algorithm calculate it.

    I suggest templating your version of the algorithm so it can sort more than just ints. I also definitely think it would be worthwhile to make one of your recursive calls into an iterative loop. You'd have to reorder your "if-else-if-else-if" to do this. Actually the 3 places these occur can be reduced down to just one in HalfSortInt, which then makes conversion to the iterative form trivial. I still say remove the special case for sorting 2 items as well, you have many more special cases than I do.
    I always encourage the use of asserts, and 'const' where appropriate.

    In my version I do this:
    Code:
    	int mid = (lo+hi)>>1;
    instead of this:
    Code:
    	Mid = (Max - Min)/2 + Min;
    I suggest using:
    Code:
    	NewElements = RightPtr - FirstPtr;
    instead of:
    Code:
    	NewElements = ((int)RightPtr - (int)FirstPtr) / sizeof(int);
    as they are equivalent, but the earlier one is safer.

    We have basically implemented the exact same algorithm, albeit slightly diferently. I had noted down that I wrote mine in 2004, though I certainly didn't think I was the first to come up with the idea.
    Get in touch some time if you'd like to compare versions or maybe even see what 60 other array sorting algorithms I have implemented in C++.
    Last edited by iMalc; 11-12-2006 at 01:59 AM.

  12. #12
    Algorithm engineer
    Join Date
    Jun 2006
    Posts
    286
    > as they are equivalent, but the earlier one is safer.

    Does the division come automaticly if you don't type cast so you don't have to do it? In that case, when do the division come?
    Last edited by TriKri; 11-13-2006 at 04:14 PM.

  13. #13
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318

    Wink

    Quote Originally Posted by TriKri
    > as they are equivalent, but the earlier one is safer.

    Does the division come automaticly if you don't type cast so you don't have to do it?
    Exactly. Just as performing ++ on a pointer will increment it by the amount required to point to the next item (not just the next byte), subtracting pointers gives you the number of items they are apart (of whatever type they point to), not just the number of bytes they are apart.
    > as they are equivalent, but the earlier one is safer.
    In that case, when do the division come?
    It is all done automagically. Nice huh?!

    The reason it is safer is that no assumption about the size of a pointer is necessary (unnecessary casts are bad). Also, because it is clearer, there is less room for programmer error.

  14. #14
    Registered User
    Join Date
    Mar 2006
    Posts
    725
    The code archive is not exactly "100K". Give exact numbers, hash values, and remove the executable, or risk ostracization by paranoid hackers in dark glasses.

    One problem it that it's not a comparison sort. There are non-comparison sorts with worst-case linear time in-place algorithms like radix sort.

    Your sort is worse than radix sort in terms of usability even. It is basically an int sort, for integers only.

    Why is it "simpler" than quicksort? In fact your implementation is basically a simple, unoptimized, non-generic quicksort with bad worst-case runtime. Introspective sort was not invented to be a faster algorithm, it was invented to be a safer and more reliable algorithm capable of running on servers and other performance-critical sorting daemons.



    Now about the code.

    Adding a two-element swap will probably not make the algorithm faster, as you are just dealing with ints, which is by the way a severe limitation. In fact it could slow down the code by playing precariously with the processor pipeline. Have you profiled it? At the profound risk of ostracization, I would claim you hadn't.

    Division instead of shift. Many older processors have a notoriously slow numerical processing unit, and newer ones aren't likely to do a lot better. But then again, you're using signed int to specify sizes instead of the correct size_t so I should keep quiet here.

    Postincrement/decrement.

    Heavily repetitive code.

    Non-portable (casting pointers to int, pragmas, usage of non-standard headers and functions).

    C-style headers and code. Type-unsafe macros, clumsy function pointers and general pointer molestation. General avoidance of C++ specific concepts, like templates, iterators and inlining.



    There's more where that came from, and you can quote me on that. Wake up and smell C++ and real computer science. The programming world cannot use your Half-Baked sort, I'm afraid.
    Last edited by jafet; 11-14-2006 at 03:30 AM.
    Code:
    #include <stdio.h>
    
    void J(char*a){int f,i=0,c='1';for(;a[i]!='0';++i)if(i==81){
    puts(a);return;}for(;c<='9';++c){for(f=0;f<9;++f)if(a[i-i%27+i%9
    /3*3+f/3*9+f%3]==c||a[i%9+f*9]==c||a[i-i%9+f]==c)goto e;a[i]=c;J(a);a[i]
    ='0';e:;}}int main(int c,char**v){int t=0;if(c>1){for(;v[1][
    t];++t);if(t==81){J(v[1]);return 0;}}puts("sudoku [0-9]{81}");return 1;}

  15. #15
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Yes it is not a comparison based sort, but it can at least be made to work when the items at least just define a subtraction operator which returns an integral value. The actual data type does not need to be integral. It's not much consolation I know.

    I profiled his code with and without the 2-item swap. It made no noticeable difference either way. I think that removing the repetition might have made a difference though.

Popular pages Recent additions subscribe to a feed