# I'm dreaming of making insertion sort O(nlogn)

Show 80 post(s) from this thread on one page
Page 1 of 2 12 Last
• 03-24-2010
RichSelian
I'm dreaming of making insertion sort O(nlogn)
Performing binary search on an sorted array costs O(logn) and inserting data into a linked list costs O(1). So if we take the advantages of both array and linked list, there will be an O(nlogn) sorting algorithm. Isn't it wonderful? But I don't see anyone is working on this idea. Is it impossible?
• 03-24-2010
tabstop
I'm not sure I see what the idea actually is. Inserting data into a linked list in order isn't O(1), and sorting a sorted array is a lot faster than O(n log n).
• 03-25-2010
iMalc
First of all it is for all practical purposes, impossible. No data structure can perform both O(1) insertion and O(1) random access (as is required for a binary search), without ridiculous memory requirements. You could do it in O(n!) space, but I wont go into that.

Secondly, most attempts to combine the properties of both linked-lists and arrays end up giving you the worst properties of both.
The only exception to this that I know of is what is used in an algorithm called OrderSort. It allows Insertion Sorting to be done in typically n^1.5 i.e. n*sqrt(n) time.
Essentially it uses a list stored within an array, and by performing sqrt(n) probes on the array, you can get within about sqrt(n) positions of the item in its linked list, giving 2*sqrt(n) reads to find the insertion point, and constant time insertion. This all relies on the data being fairly randomly ordered initially of course. The worst case is still O(n*n).
If you did fewer than sqrt(n) probes through the array, then you'd be left with significantly more probes through the list (which is slightly slower also), so it cannot be directly improved upon.

ShellSort is the closest you can come to O(n*log(n)) sorting time (it's typically slightly worse than that, depending on the gap sequence) using insertion (well k-insertion actually), which is nothing like what you are thinking of.

• 03-25-2010
RichSelian
So I am the first person to make insertion sort O(nlogn)?:)

In fact, I have written a new algorithm, which uses an array and a linked list at the same time. This algorithm costs O(nlogn) time and O(n) space (for the linked list). But its worst time complexity is still O(n^2). I am now fixing the worst time complexity. After that I'll show my algorithm to you.
• 03-25-2010
laserlight
Quote:

Originally Posted by RichSelian
So I am the first person to make insertion sort O(nlogn)?

Until we see the algorithm, we cannot tell. After all, it may be the case that you are simply mistaken.
• 03-25-2010
RichSelian
Now I have finished the algorithm. I can prove the average time complexity is O(nlogn). And the worst time is also O(nlogn) in real-time test but I haven't prove it.
But for my poor English, I cannot translate my Chinese proof into English. I have just added a simple analysis in the code.
Code:

// #############################################################################
// # FileName:  A-Bookmark.cpp
// # Author:    Zhang Li, Class CS0901
//              Huazhong University of Science & Technology
// #############################################################################
#include <cstdio>
#include <cstdlib>
#include <vector>
#include <list>
#include <algorithm>

using namespace std;

// # sorting algorithm begin
// #############################################################################
template<typename DataType> class BookmarkSorter
{
public:
static void sort(vector<DataType>& elements)
{
//
// elementList is the linked list
// bookmarks is an array whose elements are pointers to nodes of the linked list
//  the data structure costs O(n) space complexity
//
// we will first perform a binary search on bookmarks, find the position for insertion (costs O(nlogn))
//  the total time cost of this part is O(nlogn)
//
// then we will insert the data into the linked list (costs O(1))
//  the total time cost of this part is (n * O(1)) == O(n)
//
// when we have inserted a certern amount of data into the linked list,
//  the bookmark can no longer represent the linked list,
//  and the time complexity of inserting a data will increase to O(n)
//
// so it is time to "update" the bookmark. It is very easy because we just simply save the pointers to
//  each node of the linked list.
//  this happens when this size of linked list is doubled (when we have inserted 1, 2, 4, 8... elements)
//  the total time cost of this part is (1 + 2 + 4 + ... + n) == O(n)
//
//  the total time complexity of this algorithm is O(nlogn)
//

list<DataType> elementList;
vector<typename list<DataType>::iterator> bookmarks;
typename list<DataType>::iterator scanIterator, insIterator;

if(elements.size() > 1)
{
// we insert the minimum and maximum data first to reduce the problem in binary searching
swap(*min_element(elements.begin(), elements.end()), elements.at(0));
swap(*max_element(elements.begin(), elements.end()), elements.at(1));

elementList.push_back(elements.at(0));
elementList.push_back(elements.at(1));

// insert other data
for(int scanPos = 2; scanPos < elements.size(); scanPos++)
{
// check whether the bookmark needs to update
if(scanPos >= bookmarks.size() * 2)
{
bookmarks.clear();
for(scanIterator = elementList.begin(); scanIterator != elementList.end(); scanIterator++)
bookmarks.push_back(scanIterator);
}

// perform a binary search to find the insert position
int balanceNum = 0, middle, left = 0, right = bookmarks.size() - 1;
while(left <= right)
{
middle = (left + right) / 2;
elements.at(scanPos) < *bookmarks.at(middle) ? right = middle - 1 : left = middle + 1;
}
insIterator = bookmarks.at(right);

// take a linear search to find the real insert position in the list, costs only O(1)
while(elements.at(scanPos) > *insIterator)
{
insIterator++;
// I added the following two lines to auto-balance the bookmark
// so the worst time complexity is expected to be O(nlogn), but I cannot prove it
if(bookmarks.at(right) != elementList.begin() && !(balanceNum++) & 1 == 1)
bookmarks.at(right)++;
}

// now insert data
elementList.insert(insIterator, elements.at(scanPos));
}

elements = vector<DataType>(elementList.begin(), elementList.end());
}
return;
}
};
// #############################################################################
// # sorting algorithm end

• 03-25-2010
RichSelian
This is a real-time test.
Code:

Sorting 50000 random double data
================ Test Result ================
Size Filename          Time                Space
2509 A-Quick.cpp      0.096s                717k
2259 A-Heap.cpp        0.100s                717k
2450 A-Merge.cpp      0.100s                785k
3315 A-Bookmark.cpp    0.100s                729k
5344 A-Library.cpp    0.124s                720k
4214 A-Skiplist.cpp    0.164s                949k
1603 S-Insertion.cpp  2.144s                727k
1628 S-Selection.cpp  3.660s                713k
1554 S-Bubble.cpp      13.233s                739k

Sorting 500000 random double data
================ Test Result ================
Size Filename          Time                Space
2509 A-Quick.cpp      0.912s                718k
2450 A-Merge.cpp      1.120s                790k
2259 A-Heap.cpp        1.192s                718k
3315 A-Bookmark.cpp    1.740s                729k
5344 A-Library.cpp    1.976s                722k
4214 A-Skiplist.cpp    2.524s                1995k

• 03-25-2010
RichSelian
My algorithm is not so quick as quick sort or merge sort. but it is really an O(nlogn) algorithm. I'm sure that there will be someone who can improve it to beat the other algorithm.

I leave my name in the source code. If this algorithm gets famous one day. I hope somebody knows it is written by a Chinese student in Grade one of university.;)
• 03-25-2010
phantomotap
Quote:

So I am the first person to make insertion sort O(nlogn)?
If you have done so, I'd love to see it. If you have done so, you should publish a paper. I'm betting that iMalc would love to help with that.

Quote:

My algorithm is not so quick as quick sort or merge sort.
I don't know what this algorithm is, but it isn't an O(n * log(n)) insertion sort. (If nothing else, it isn't stable.)

But it may be interesting; I've not really looked to see if you are doing anything interesting. I've not looked to see if you've done anything interesting because my very first test (a 100000 element array with random values) crashed with an uncaught exception. You'd have to provide a correct implementation before we can study the algorithm.

I suppose it could have been a fluke, but it seems unlikely.

In any event, you can't really just say "this is a sorting algorithm" and leave the audience waiting. What does it achieve? What are its other characteristics? It obviously isn't stable, but maybe it can be iteratively invoked over subsets? (Or something else.)

Quote:

I'm sure that there will be someone who can improve it to beat the other algorithm.
That is exceedingly unlikely. You couldn't manufacture an improvement of this algorithm and compare the result to an average implementation of "Quicksort"; you would need to compare the algorithm against one of the many potential improvements of "Quicksort".

"Quicksort" can also be implemented with only O(log(n)) space requirements, so you'd need to define "best" as being "fewer comparisons and assignments of the element type".

Soma
• 03-25-2010
anon
IMO it's not impossible that it's O(n log n), although I'm not sure if it can be called insertion sort (trading memory usage for speed). In any case it beats GCC's __insertion_sort quite nicely (which I couldn't wait to complete for larger arrays, where BookmarkSort completed in a matter of seconds).

However, I don't see much room for improvement, and std::sort beats it by a great margin.

Also, what about having to find the min and max element first? Could you do without it? One of the special properties of insertion sort is that it can sort incoming data, but in such case the min and max value won't be available to you. Otherwise I don't see any reason why I would like to use this.

-------

As to code style, free functions (e.g in a namespace) are fine in C++. You don't have to make every free function a static function in a class. And even then the class doesn't have to be a template, which here just doesn't let you use template type deduction.

Also you could make it warning-free.

Code:

for(int scanPos = 2; scanPos < elements.size(); scanPos++)
size() returns an unsigned type and comparisons between signed and unsigned types are worth a warning. (Use size_t, or the container's size_type for indices.)

Code:

if(bookmarks.at(right) != elementList.begin() && !(balanceNum++) & 1 == 1)
Here the compiler suggests you confirm your intentions with extra brackets (isn't it more complicated that it needs to be anyway?).
• 03-25-2010
RichSelian
Quote:

If you have done so, I'd love to see it. If you have done so, you should publish a paper. I'm betting that iMalc would love to help with that.
But I don't know how to write a paper.:(

Code:

I don't know what this algorithm is, but it isn't an O(n * log(n)) insertion sort. (If nothing else, it isn't stable.)
No, it's really O(nlogn) and a stable sort. I can prove it (but not in English).

Code:

I've not looked to see if you've done anything interesting because my very first test (a 100000 element array with random values) crashed with an uncaught exception.
Could you provide the exception information so I can fix it?

Code:

Also, what about having to find the min and max element first? Could you do without it?
This can exactly can be done. Just remove it and add some code in the binary searching part to get rid of std::out_of_range.

Code:

size() returns an unsigned type and comparisons between signed and unsigned types are worth a warning. (Use size_t, or the container's size_type for indices.)
Yes, I should use a size_t instead of int.
• 03-25-2010
phantomotap
O_o

You can't prove the implementation is an O(n * log(n)) stable sort because it isn't. I knew that when I posted. I'm not saying that the algorithm you call yourself implementing isn't such an algorithm; I'm only saying that this implementation is not it.

I grabbed a few minutes to run this implementation through my sorting benchmark. I'll post a section of the results.

This is the top 67 entries--corresponding to the errors logged during the first 100 iterations. In other words your implementation managed to successfully sort the data only 33 times out of first 100 tries. Your algorithm either crashed, failed to sort the data, or failed to keep elements in a stable order.

The rest of the log looks much the same.

Soma

Code:

error::: `results did not match {state::[seed:(28595) size:(9) duplicates:(2)]}'
error::: `results did not match {state::[seed:(30429) size:(10) duplicates:(2)]}'
error::: `results did not match {state::[seed:(32247) size:(12) duplicates:(2)]}'
error::: `results did not match {state::[seed:(27736) size:(13) duplicates:(2)]}'
error::: `results did not match {state::[seed:(12816) size:(15) duplicates:(2)]}'
error::: `results did not match {state::[seed:(5119) size:(16) duplicates:(2)]}'
error::: `results did not match {state::[seed:(9033) size:(18) duplicates:(2)]}'
error::: `results did not match {state::[seed:(8191) size:(19) duplicates:(2)]}'
error::: `results did not match {state::[seed:(4233) size:(21) duplicates:(2)]}'
error::: `results did not match {state::[seed:(27435) size:(22) duplicates:(2)]}'
error::: `results did not match {state::[seed:(22877) size:(24) duplicates:(2)]}'
error::: `results did not match {state::[seed:(29365) size:(25) duplicates:(2)]}'
error::: `results did not match {state::[seed:(16812) size:(27) duplicates:(2)]}'
error::: `results did not match {state::[seed:(19569) size:(28) duplicates:(2)]}'
error::: `results did not match {state::[seed:(1333) size:(30) duplicates:(2)]}'
error::: `results did not match {state::[seed:(5302) size:(31) duplicates:(2)]}'
error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
error::: `results did not match {state::[seed:(27409) size:(34) duplicates:(2)]}'
error::: `results did not match {state::[seed:(1006) size:(36) duplicates:(2)]}'
error::: `results did not match {state::[seed:(23379) size:(37) duplicates:(2)]}'
error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
error::: `results did not match {state::[seed:(24222) size:(40) duplicates:(2)]}'
error::: `results did not match {state::[seed:(18365) size:(42) duplicates:(2)]}'
error::: `results did not match {state::[seed:(11216) size:(43) duplicates:(2)]}'
error::: `results did not match {state::[seed:(4126) size:(45) duplicates:(2)]}'
error::: `results did not match {state::[seed:(20340) size:(46) duplicates:(2)]}'
error::: `results did not match {state::[seed:(19433) size:(48) duplicates:(2)]}'
error::: `results did not match {state::[seed:(20733) size:(49) duplicates:(2)]}'
error::: `results did not match {state::[seed:(11113) size:(51) duplicates:(2)]}'
error::: `results did not match {state::[seed:(18565) size:(52) duplicates:(2)]}'
error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
error::: `results did not match {state::[seed:(5534) size:(55) duplicates:(2)]}'
error::: `results did not match {state::[seed:(17582) size:(57) duplicates:(2)]}'
error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
error::: `results did not match {state::[seed:(21371) size:(60) duplicates:(2)]}'
error::: `results did not match {state::[seed:(23975) size:(61) duplicates:(2)]}'
error::: `results did not match {state::[seed:(13729) size:(63) duplicates:(2)]}'
error::: `results did not match {state::[seed:(3671) size:(64) duplicates:(2)]}'
error::: `results did not match {state::[seed:(23583) size:(65) duplicates:(2)]}'
error::: `results did not match {state::[seed:(4478) size:(66) duplicates:(2)]}'
error::: `results did not match {state::[seed:(18114) size:(67) duplicates:(2)]}'
error::: `results did not match {state::[seed:(32697) size:(69) duplicates:(2)]}'
error::: `results did not match {state::[seed:(6190) size:(70) duplicates:(2)]}'
error::: `results did not match {state::[seed:(27433) size:(72) duplicates:(2)]}'
error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
error::: `results did not match {state::[seed:(14943) size:(78) duplicates:(2)]}'
error::: `results did not match {state::[seed:(3089) size:(79) duplicates:(2)]}'
error::: `results did not match {state::[seed:(26937) size:(81) duplicates:(2)]}'
error::: `results did not match {state::[seed:(30725) size:(82) duplicates:(2)]}'
error::: `results did not match {state::[seed:(19539) size:(84) duplicates:(2)]}'
error::: `results did not match {state::[seed:(30477) size:(85) duplicates:(2)]}'
error::: `results did not match {state::[seed:(6451) size:(86) duplicates:(2)]}'
error::: `results did not match {state::[seed:(26407) size:(87) duplicates:(2)]}'
error::: `results did not match {state::[seed:(27994) size:(88) duplicates:(2)]}'
error::: `results did not match {state::[seed:(15731) size:(90) duplicates:(2)]}'
error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
error::: `results did not match {state::[seed:(23593) size:(94) duplicates:(2)]}'
error::: `results did not match {state::[seed:(17822) size:(96) duplicates:(2)]}'
error::: `results did not match {state::[seed:(29345) size:(97) duplicates:(2)]}'
error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'
error::: `results did not match {state::[seed:(1608) size:(100) duplicates:(2)]}'
error::: `results did not match {state::[seed:(12434) size:(102) duplicates:(2)]}'
error::: `results did not match {state::[seed:(9613) size:(103) duplicates:(2)]}'
error::: `uncaught exception {state::[std::exception::what:(vector::_M_range_check)]}'

• 03-26-2010
iMalc
I got different results. I had it sort correctly every time, and it passed my basic sorting tests, including confirming that it performed a stable sort.
I did modify the code to work on an array rather than a vector though, so something in my changes could have fixed a problem, or something in phantomotap's changes may have introduced a problem, or maybe my unit tests in this area just suck. I know my sort tests aren't as thorough as some of my other tests.

I also am very sure this is not an O(n*logn) sort in the worst case. In fact I think one of my initial orderings may have forced a worst or at least one of the worst cases out of it.
That ordering was when the first half of a sorted array is swapped with the second half.
Bear in mind that my code exaggerates the time taken for swaps and comparisons, which has the effect of making loops that do neither seem to take very little time at all.

I'll post more soon, gotta eat tea now.
• 03-26-2010
RichSelian
It is strange. What data type are you sorting? I try to sort 100,000 random double data for 100 times and there's neither wrong answers nor exceptions. I didn't check the stability because the test script is difficult to write. Or maybe there are bugs hiding in the code, but at least the theory is correct.
By the way, could you provide your test script for me?
• 03-26-2010
RichSelian
The std::list<> also slow down the speed because we need singly linked list but it provides doubly linked list. I use a vector to emulate the list and get about 20% time improvement.
Show 80 post(s) from this thread on one page
Page 1 of 2 12 Last