Thread: I'm dreaming of making insertion sort O(nlogn)

  1. #1
    Registered User
    Join Date
    Mar 2010
    Location
    China
    Posts
    74

    I'm dreaming of making insertion sort O(nlogn)

    Performing binary search on an sorted array costs O(logn) and inserting data into a linked list costs O(1). So if we take the advantages of both array and linked list, there will be an O(nlogn) sorting algorithm. Isn't it wonderful? But I don't see anyone is working on this idea. Is it impossible?

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    I'm not sure I see what the idea actually is. Inserting data into a linked list in order isn't O(1), and sorting a sorted array is a lot faster than O(n log n).

  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    First of all it is for all practical purposes, impossible. No data structure can perform both O(1) insertion and O(1) random access (as is required for a binary search), without ridiculous memory requirements. You could do it in O(n!) space, but I wont go into that.

    Secondly, most attempts to combine the properties of both linked-lists and arrays end up giving you the worst properties of both.
    The only exception to this that I know of is what is used in an algorithm called OrderSort. It allows Insertion Sorting to be done in typically n^1.5 i.e. n*sqrt(n) time.
    Essentially it uses a list stored within an array, and by performing sqrt(n) probes on the array, you can get within about sqrt(n) positions of the item in its linked list, giving 2*sqrt(n) reads to find the insertion point, and constant time insertion. This all relies on the data being fairly randomly ordered initially of course. The worst case is still O(n*n).
    If you did fewer than sqrt(n) probes through the array, then you'd be left with significantly more probes through the list (which is slightly slower also), so it cannot be directly improved upon.

    ShellSort is the closest you can come to O(n*log(n)) sorting time (it's typically slightly worse than that, depending on the gap sequence) using insertion (well k-insertion actually), which is nothing like what you are thinking of.

    More info on my website.
    Last edited by iMalc; 03-25-2010 at 12:39 AM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #4
    Registered User
    Join Date
    Mar 2010
    Location
    China
    Posts
    74

    Smile

    So I am the first person to make insertion sort O(nlogn)?

    In fact, I have written a new algorithm, which uses an array and a linked list at the same time. This algorithm costs O(nlogn) time and O(n) space (for the linked list). But its worst time complexity is still O(n^2). I am now fixing the worst time complexity. After that I'll show my algorithm to you.

  5. #5
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by RichSelian
    So I am the first person to make insertion sort O(nlogn)?
    Until we see the algorithm, we cannot tell. After all, it may be the case that you are simply mistaken.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #6
    Registered User
    Join Date
    Mar 2010
    Location
    China
    Posts
    74

    Smile

    Now I have finished the algorithm. I can prove the average time complexity is O(nlogn). And the worst time is also O(nlogn) in real-time test but I haven't prove it.
    But for my poor English, I cannot translate my Chinese proof into English. I have just added a simple analysis in the code.
    Code:
    // #############################################################################
    // # FileName:  A-Bookmark.cpp
    // # Author:    Zhang Li, Class CS0901
    //              Huazhong University of Science & Technology
    // #############################################################################
    #include <cstdio>
    #include <cstdlib>
    #include <vector>
    #include <list>
    #include <algorithm>
    
    using namespace std;
    
    // # sorting algorithm begin
    // #############################################################################
    template<typename DataType> class BookmarkSorter
    {
    public:
        static void sort(vector<DataType>& elements)
        {
            //
            // elementList is the linked list
            // bookmarks is an array whose elements are pointers to nodes of the linked list
            //  the data structure costs O(n) space complexity
            //
            // we will first perform a binary search on bookmarks, find the position for insertion (costs O(nlogn))
            //  the total time cost of this part is O(nlogn)
            //
            // then we will insert the data into the linked list (costs O(1))
            //  the total time cost of this part is (n * O(1)) == O(n)
            //
            // when we have inserted a certern amount of data into the linked list,
            //  the bookmark can no longer represent the linked list,
            //  and the time complexity of inserting a data will increase to O(n)
            //
            // so it is time to "update" the bookmark. It is very easy because we just simply save the pointers to
            //  each node of the linked list.
            //  this happens when this size of linked list is doubled (when we have inserted 1, 2, 4, 8... elements)
            //  the total time cost of this part is (1 + 2 + 4 + ... + n) == O(n)
            //
            //  the total time complexity of this algorithm is O(nlogn)
            //
    
            list<DataType> elementList;
            vector<typename list<DataType>::iterator> bookmarks;
            typename list<DataType>::iterator scanIterator, insIterator;
    
            if(elements.size() > 1)
            {
                // we insert the minimum and maximum data first to reduce the problem in binary searching
                swap(*min_element(elements.begin(), elements.end()), elements.at(0));
                swap(*max_element(elements.begin(), elements.end()), elements.at(1));
    
                elementList.push_back(elements.at(0));
                elementList.push_back(elements.at(1));
    
                // insert other data
                for(int scanPos = 2; scanPos < elements.size(); scanPos++)
                {
                    // check whether the bookmark needs to update
                    if(scanPos >= bookmarks.size() * 2)
                    {
                        bookmarks.clear();
                        for(scanIterator = elementList.begin(); scanIterator != elementList.end(); scanIterator++)
                            bookmarks.push_back(scanIterator);
                    }
    
                    // perform a binary search to find the insert position
                    int balanceNum = 0, middle, left = 0, right = bookmarks.size() - 1;
                    while(left <= right)
                    {
                        middle = (left + right) / 2;
                        elements.at(scanPos) < *bookmarks.at(middle) ? right = middle - 1 : left = middle + 1;
                    }
                    insIterator = bookmarks.at(right);
    
                    // take a linear search to find the real insert position in the list, costs only O(1)
                    while(elements.at(scanPos) > *insIterator)
                    {
                        insIterator++;
                        // I added the following two lines to auto-balance the bookmark
                        // so the worst time complexity is expected to be O(nlogn), but I cannot prove it
                        if(bookmarks.at(right) != elementList.begin() && !(balanceNum++) & 1 == 1)
                            bookmarks.at(right)++;
                    }
    
                    // now insert data
                    elementList.insert(insIterator, elements.at(scanPos));
                }
    
                elements = vector<DataType>(elementList.begin(), elementList.end());
            }
            return;
        }
    };
    // #############################################################################
    // # sorting algorithm end

  7. #7
    Registered User
    Join Date
    Mar 2010
    Location
    China
    Posts
    74

    Smile

    This is a real-time test.
    Code:
    Sorting 50000 random double data
    ================ Test Result ================
    Size Filename          Time		Space
    2509 A-Quick.cpp       0.096s		717k
    2259 A-Heap.cpp        0.100s		717k
    2450 A-Merge.cpp       0.100s		785k
    3315 A-Bookmark.cpp    0.100s		729k
    5344 A-Library.cpp     0.124s		720k
    4214 A-Skiplist.cpp    0.164s		949k
    1603 S-Insertion.cpp   2.144s		727k
    1628 S-Selection.cpp   3.660s		713k
    1554 S-Bubble.cpp      13.233s		739k
    
    Sorting 500000 random double data
    ================ Test Result ================
    Size Filename          Time		Space
    2509 A-Quick.cpp       0.912s		718k
    2450 A-Merge.cpp       1.120s		790k
    2259 A-Heap.cpp        1.192s		718k
    3315 A-Bookmark.cpp    1.740s		729k
    5344 A-Library.cpp     1.976s		722k
    4214 A-Skiplist.cpp    2.524s		1995k

  8. #8
    Registered User
    Join Date
    Mar 2010
    Location
    China
    Posts
    74

    Smile

    My algorithm is not so quick as quick sort or merge sort. but it is really an O(nlogn) algorithm. I'm sure that there will be someone who can improve it to beat the other algorithm.

    I leave my name in the source code. If this algorithm gets famous one day. I hope somebody knows it is written by a Chinese student in Grade one of university.

  9. #9
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    So I am the first person to make insertion sort O(nlogn)?
    If you have done so, I'd love to see it. If you have done so, you should publish a paper. I'm betting that iMalc would love to help with that.

    My algorithm is not so quick as quick sort or merge sort.
    I don't know what this algorithm is, but it isn't an O(n * log(n)) insertion sort. (If nothing else, it isn't stable.)

    But it may be interesting; I've not really looked to see if you are doing anything interesting. I've not looked to see if you've done anything interesting because my very first test (a 100000 element array with random values) crashed with an uncaught exception. You'd have to provide a correct implementation before we can study the algorithm.

    I suppose it could have been a fluke, but it seems unlikely.

    In any event, you can't really just say "this is a sorting algorithm" and leave the audience waiting. What does it achieve? What are its other characteristics? It obviously isn't stable, but maybe it can be iteratively invoked over subsets? (Or something else.)

    I'm sure that there will be someone who can improve it to beat the other algorithm.
    That is exceedingly unlikely. You couldn't manufacture an improvement of this algorithm and compare the result to an average implementation of "Quicksort"; you would need to compare the algorithm against one of the many potential improvements of "Quicksort".

    "Quicksort" can also be implemented with only O(log(n)) space requirements, so you'd need to define "best" as being "fewer comparisons and assignments of the element type".

    Soma
    Last edited by phantomotap; 03-25-2010 at 09:32 AM.

  10. #10
    The larch
    Join Date
    May 2006
    Posts
    3,573
    IMO it's not impossible that it's O(n log n), although I'm not sure if it can be called insertion sort (trading memory usage for speed). In any case it beats GCC's __insertion_sort quite nicely (which I couldn't wait to complete for larger arrays, where BookmarkSort completed in a matter of seconds).

    However, I don't see much room for improvement, and std::sort beats it by a great margin.

    Also, what about having to find the min and max element first? Could you do without it? One of the special properties of insertion sort is that it can sort incoming data, but in such case the min and max value won't be available to you. Otherwise I don't see any reason why I would like to use this.

    -------

    As to code style, free functions (e.g in a namespace) are fine in C++. You don't have to make every free function a static function in a class. And even then the class doesn't have to be a template, which here just doesn't let you use template type deduction.

    Also you could make it warning-free.

    Code:
    for(int scanPos = 2; scanPos < elements.size(); scanPos++)
    size() returns an unsigned type and comparisons between signed and unsigned types are worth a warning. (Use size_t, or the container's size_type for indices.)

    Code:
    if(bookmarks.at(right) != elementList.begin() && !(balanceNum++) & 1 == 1)
    Here the compiler suggests you confirm your intentions with extra brackets (isn't it more complicated that it needs to be anyway?).
    Last edited by anon; 03-25-2010 at 01:49 PM.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  11. #11
    Registered User
    Join Date
    Mar 2010
    Location
    China
    Posts
    74
    If you have done so, I'd love to see it. If you have done so, you should publish a paper. I'm betting that iMalc would love to help with that.
    But I don't know how to write a paper.

    Code:
    I don't know what this algorithm is, but it isn't an O(n * log(n)) insertion sort. (If nothing else, it isn't stable.)
    No, it's really O(nlogn) and a stable sort. I can prove it (but not in English).

    Code:
    I've not looked to see if you've done anything interesting because my very first test (a 100000 element array with random values) crashed with an uncaught exception.
    Could you provide the exception information so I can fix it?

    Code:
    Also, what about having to find the min and max element first? Could you do without it?
    This can exactly can be done. Just remove it and add some code in the binary searching part to get rid of std:ut_of_range.

    Code:
    size() returns an unsigned type and comparisons between signed and unsigned types are worth a warning. (Use size_t, or the container's size_type for indices.)
    Yes, I should use a size_t instead of int.

  12. #12
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    O_o

    You can't prove the implementation is an O(n * log(n)) stable sort because it isn't. I knew that when I posted. I'm not saying that the algorithm you call yourself implementing isn't such an algorithm; I'm only saying that this implementation is not it.

    I grabbed a few minutes to run this implementation through my sorting benchmark. I'll post a section of the results.

    This is the top 67 entries--corresponding to the errors logged during the first 100 iterations. In other words your implementation managed to successfully sort the data only 33 times out of first 100 tries. Your algorithm either crashed, failed to sort the data, or failed to keep elements in a stable order.

    The rest of the log looks much the same.

    Soma

    Code:
    error::: `results did not match {state::[seed:(28595) size:(9) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(30429) size:(10) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(32247) size:(12) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(27736) size:(13) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(12816) size:(15) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(5119) size:(16) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(9033) size:(18) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(8191) size:(19) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(4233) size:(21) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(27435) size:(22) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(22877) size:(24) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(29365) size:(25) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(16812) size:(27) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(19569) size:(28) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(1333) size:(30) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(5302) size:(31) duplicates:(2)]}'
    error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
    error::: `results did not match {state::[seed:(27409) size:(34) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(1006) size:(36) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(23379) size:(37) duplicates:(2)]}'
    error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
    error::: `results did not match {state::[seed:(24222) size:(40) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(18365) size:(42) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(11216) size:(43) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(4126) size:(45) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(20340) size:(46) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(19433) size:(48) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(20733) size:(49) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(11113) size:(51) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(18565) size:(52) duplicates:(2)]}'
    error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
    error::: `results did not match {state::[seed:(5534) size:(55) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(17582) size:(57) duplicates:(2)]}'
    error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
    error::: `results did not match {state::[seed:(21371) size:(60) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(23975) size:(61) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(13729) size:(63) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(3671) size:(64) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(23583) size:(65) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(4478) size:(66) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(18114) size:(67) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(32697) size:(69) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(6190) size:(70) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(27433) size:(72) duplicates:(2)]}'
    error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
    error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
    error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
    error::: `results did not match {state::[seed:(14943) size:(78) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(3089) size:(79) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(26937) size:(81) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(30725) size:(82) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(19539) size:(84) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(30477) size:(85) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(6451) size:(86) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(26407) size:(87) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(27994) size:(88) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(15731) size:(90) duplicates:(2)]}'
    error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
    error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
    error::: `results did not match {state::[seed:(23593) size:(94) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(17822) size:(96) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(29345) size:(97) duplicates:(2)]}'
    error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
    error::: `results did not match {state::[seed:(1608) size:(100) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(12434) size:(102) duplicates:(2)]}'
    error::: `results did not match {state::[seed:(9613) size:(103) duplicates:(2)]}'
    error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'

  13. #13
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I got different results. I had it sort correctly every time, and it passed my basic sorting tests, including confirming that it performed a stable sort.
    I did modify the code to work on an array rather than a vector though, so something in my changes could have fixed a problem, or something in phantomotap's changes may have introduced a problem, or maybe my unit tests in this area just suck. I know my sort tests aren't as thorough as some of my other tests.

    I also am very sure this is not an O(n*logn) sort in the worst case. In fact I think one of my initial orderings may have forced a worst or at least one of the worst cases out of it.
    That ordering was when the first half of a sorted array is swapped with the second half.
    Bear in mind that my code exaggerates the time taken for swaps and comparisons, which has the effect of making loops that do neither seem to take very little time at all.

    I'll post more soon, gotta eat tea now.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  14. #14
    Registered User
    Join Date
    Mar 2010
    Location
    China
    Posts
    74
    It is strange. What data type are you sorting? I try to sort 100,000 random double data for 100 times and there's neither wrong answers nor exceptions. I didn't check the stability because the test script is difficult to write. Or maybe there are bugs hiding in the code, but at least the theory is correct.
    By the way, could you provide your test script for me?

  15. #15
    Registered User
    Join Date
    Mar 2010
    Location
    China
    Posts
    74
    The std::list<> also slow down the speed because we need singly linked list but it provides doubly linked list. I use a vector to emulate the list and get about 20% time improvement.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 26
    Last Post: 07-05-2010, 10:43 AM
  2. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  3. Request for comments
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 15
    Last Post: 01-02-2004, 10:33 AM
  4. recursive insertion sort.
    By Alien_Freak in forum C++ Programming
    Replies: 3
    Last Post: 03-23-2002, 01:31 AM
  5. Pls Help: Insertion sort using pointer notation
    By mack_ol in forum C Programming
    Replies: 1
    Last Post: 02-17-2002, 01:03 PM