Linked List Sorting

**acho.arnold** · 05-17-2013

Is it more efficient to sort an linked list after the values have been inserted into the list OR to sort and item before inserting into the list so as to create an ordered list.
I am asuming the worst case senario for the items to be sorted.

And also Please what is the most efficient data Structure to use for Random insertion and searching of unordered Data?

**Malcolm McLean** · 05-17-2013

Originally Posted by acho.arnold

Is it more efficient to sort an linked list after the values have been inserted into the list OR to sort and item before inserting into the list so as to create an ordered list.
I am asuming the worst case senario for the items to be sorted.

This question isn't well posed. If the values are initially stored in an array, it will be easier to sort them first using a highly efficient qsort() provided by your implementation. But a mergesort is O(N log N) and can be implemented for a linked list.

And also Please what is the most efficient data Structure to use for Random insertion and searching of unordered Data?

A tree. If we are storing strings (for the sake of argument), we split the strings into two roughly equal groups, A-M on one side, N-Z on the other. You do that recursively until at the leaves you have only one string. So you can quickly find any string, and you can insert data by going to the right leaf and splitting it.
However the problem is that inserts are seldom truly random, and so trees tend to become unbalanced and the system breaks down as the tree turns into what is effectively a linked list. There are many ways of preventing this, but the question of which is best is a complicated one.

**rcgldr** · 05-17-2013

Originally Posted by acho.arnold

Is it more efficient to sort an linked list after the values have been inserted into the list OR to sort and item before inserting into the list so as to create an ordered list.

Depends on the size of the data elements and the number of elements. If you're just sorting the data values in the list, you could create an array of pointers to the data values and use that sort the data, with just about any sort algorithm.

Originally Posted by acho.arnold

What is the most efficient data Structure to use for Random insertion and searching of unordered Data?

Again, it depends on the size of the data elements and the number of elements. You could just use a flat array and use memmove to move a portion of the array upwards in order to insert data. If data isn't continously being added and/or removed from the set of data, then you could just wait until all of the data is stored in the array, then sort the array. Doing a binary search on a sorted array is normally the fastest method for searching unless data element size is so small that you can use it as an index.

**phantomotap** · 05-17-2013

Is it more efficient to sort an linked list after the values have been inserted into the list OR to sort and item before inserting into the list so as to create an ordered list.

O_o

You can't search a simple linked list at speed.

What are you doing that you need the list sorted?

If it is simple iteration, what are you doing more of: adding elements or iterating over elements?

If you are doing operations in sufficiently large sets, having two data structures is probably going to be more efficient.

So, here we are, if you want the "best fit" for your situation, you have to explain your situation and not just tell us what you'd like to use.

And also Please what is the most efficient data Structure to use for Random insertion and searching of unordered Data?

O_o

There really is no such data structure as "most efficient" for both random insertion and searching unordered data.

The data structures available with particularly efficient searching impose ordering or have a high random insertion cost.

The tree recommended by Malcom McLean imposes order on the data so that searching can be particularly efficient.

The array recommended by rcgldr imposes no such order but has poor random insertion performance.

As before, you'll have to give more information to get intelligent recommendations.

Soma

**iMalc** · 05-17-2013

Inserting each item in order as you go is O(n*n), so to beat that when sorting after items have just been appended in random order, all you need is a sorting algorithm that is better than O(n*n).
Merge Sort would do it.

**manasij7479** · 05-17-2013

Originally Posted by iMalc

Inserting each item in order as you go is O(n*n)

Err..why would inserting an element in a sorted linked list need quadratic time ?
Or were you talking about an array ?

**phantomotap** · 05-17-2013

O_o

The assumption is rather "O(n * m)" where "n" is the number of elements to inspect and "m" is the number of items added.

Assuming the worst case, you have to inspect every element of the list to figure out where to insert the new item.

That said, he isn't necessarily correct anyway because we don't know what the original poster is doing.

The algorithm, as he referenced, is "O(n * m)" which keeps the list sorted as elements are added, but we don't know what is a requirement versus the actual process.

That's a problem because using Merge Sort as suggested could be "O(m * n * log n)" if the process is naive.

Soma

**rcgldr** · 05-17-2013

Originally Posted by phantomotap

The array recommended by rcgldr imposes no such order but has poor random insertion performance.

I was suggesting a sorted array, so there is order (to allow binary search), and as you mentioned insertion cost is high, except that memmove that I recommended for insertion is fairly fast and cache will help here.

Originally Posted by iMalc

Inserting each item in order as you go is O(n*n)

Originally Posted by manasij7479

Why would inserting an element in a sorted linked list need quadratic time ? Or were you talking about an array ?

I think this was in reference to my suggestion of using a sorted array and inserting so that the array remains sorted.

Originally Posted by phantomotap

we don't know what the original poster is doing.

This the main issue.

**Malcolm McLean** · 05-17-2013

Originally Posted by rcgldr

I was suggesting a sorted array, so there is order (to allow binary search), and as you mentioned insertion cost is high, except that memmove that I recommended for insertion is fairly fast and cache will help here.

I think this was in reference to my suggestion of using a sorted array and inserting so that the array remains sorted.

This the main issue.

Inserting an element in order into a flat array is O(N). The constant can be pretty low because memmove is usually very efficiently implemented, but you've still go N operations.
Inserting into a balanced search tree is O(log N). So even when N gets into the billions, it's still only 30 or so operations. Which is why a balanced search tree is the standard answer to this problem. However it's a lot more difficult to implement than the memmove method.

**phantomotap** · 05-17-2013

I was suggesting a sorted array, so there is order (to allow binary search), and as you mentioned insertion cost is high, except that memmove that I recommended for insertion is fairly fast and cache will help here.

No, in this case `memmove' isn't "fairly fast" because I was referencing complexity.

The speed of `memmove' isn't relevant if you need to copy 100 elements, then 101, then 102, and so on until you've added every new element.

Soma

**rcgldr** · 05-17-2013

Originally Posted by Malcolm McLean

Inserting an element in order into a flat array is O(N). The constant can be pretty low because memmove is usually very efficiently implemented, but you've still go N operations. Inserting into a balanced search tree is O(log N). So even when N gets into the billions, it's still only 30 or so operations. Which is why a balanced search tree is the standard answer to this problem. However it's a lot more difficult to implement than the memmove method.

The unknown here is the number of insert operations versus the number of search operations. The sorted array would be targeting the case where the program is mostly doing searches, but at this point we don't know what the original poster is trying to implement.

One interpretation of the original post, is that there is an existing set of data that is to end up as a sorted linked list. In this case, it would be faster to sort the data first, then put it into a sorted linked list. For the linked list part, rather than using the conventional method of appending nodes to a list, an array of nodes could be allocated, and the pointers filled in via iteration.

**Malcolm McLean** · 05-18-2013

Originally Posted by phantomotap

No, in this case `memmove' isn't "fairly fast" because I was referencing complexity.

The speed of `memmove' isn't relevant if you need to copy 100 elements, then 101, then 102, and so on until you've added every new element.

Soma

It's N squared (1 + 2 + 3 + 4 etc equals N^2 / 2 + N/2). N can never go larger than your total memory.
So if N is 1,000, then that means a million operations, which at say 1 billion operations per second means 1 millisecond, very likely to be acceptable. If N is a million, that means 1000 billion operations, or 1000 seconds, or about 17 minutes, which users will certainly notice. It all depends on N.

**phantomotap** · 05-18-2013

So if N is 1,000, then that means a million operations, which at say 1 billion operations per second means 1 millisecond, very likely to be acceptable. If N is a million, that means 1000 billion operations, or 1000 seconds, or about 17 minutes, which users will certainly notice. It all depends on N.

O_o

That's a very foolish way of considering algorithms. The purpose of complexity notations is that they allow us to choose reasonably between algorithms and data structures without considering everything in place. We, of course, should test everything in place, but that isn't a common practice. One does not reach for an array by default when considering randomly inserted values for the same reason we don't reach for a trivial "Quick Sort" when we know our data is already mostly sorted. The primary consideration is of the growth factor versus "n" and not of any specific "n". The problem is that your list of examples lacks a growth factor specific example, but I'll provide one for you using your mathematics: 31623 takes only a second whereas 63246 would take four seconds. Using a balanced tree, and your same trivially applied mathematics, you get: 31623 in one second whereas 63246 takes only a single operation more than needed for 31623.

The complexity is your primary consideration. You may best concern yourself with refinements to the implementation after an appropriate algorithm is chosen. Let's say you, for example, have a `memmove' which costs a more purposeful eleven ticks per element, you may get your 100,000 element array updated over "n" in about two seconds versus ~93000 in the same time for a sloppy implementation of thirteen ticks per element. (That small improvement is provided by the assumption of a good standard library which I'm sure may be provided.) The growth factor of alternative strategies makes this improvement, the "fairly fast" suggestion of `memmove', meaningless.

So, no, my comment holds and doesn't depend on "n" in any way.

Here, let me show you. The approach recommended by rcgldr may be the correct one; he directly referenced the conditions on the size and number of elements. (We don't know, but that's not relevant to this post.) If the suggestion by rcgldr turns out to be appropriate, it will indeed be because the size and number of elements are within "spec" and not because `memmove' is "fairly fast" as the speed of `memmove' can not possible make up the huge difference in growth factor between the array or tree based implementations.

Soma

**Malcolm McLean** · 05-18-2013

Originally Posted by phantomotap

O_o

That's a very foolish way of considering algorithms. The purpose of complexity notations is that they allow us to choose reasonably between algorithms and data structures without considering everything in place. We, of course, should test everything in place, but that isn't a common practice. One does not reach for an array by default when considering randomly inserted values for the same reason we don't reach for a trivial "Quick Sort" when we know our data is already mostly sorted. The primary consideration is of the growth factor versus "n" and not of any specific "n". The problem is that your list of examples lacks a growth factor specific example, but I'll provide one for you using your mathematics: 31623 takes only a second whereas 63246 would take four seconds. Using a balanced tree, and your same trivially applied mathematics, you get: 31623 in one second whereas 63246 takes only a single operation more than needed for 31623.

The complexity is your primary consideration. You may best concern yourself with refinements to the implementation after an appropriate algorithm is chosen. Let's say you, for example, have a `memmove' which costs a more purposeful eleven ticks per element, you may get your 100,000 element array updated over "n" in about two seconds versus ~93000 in the same time for a sloppy implementation of thirteen ticks per element. (That small improvement is provided by the assumption of a good standard library which I'm sure may be provided.) The growth factor of alternative strategies makes this improvement, the "fairly fast" suggestion of `memmove', meaningless.

So, no, my comment holds and doesn't depend on "n" in any way.

Here, let me show you. The approach recommended by rcgldr may be the correct one; he directly referenced the conditions on the size and number of elements. (We don't know, but that's not relevant to this post.) If the suggestion by rcgldr turns out to be appropriate, it will indeed be because the size and number of elements are within "spec" and not because `memmove' is "fairly fast" as the speed of `memmove' can not possible make up the huge difference in growth factor between the array or tree based implementations.

Soma

The big O notation is the most important part of the picture, but it's not the whole picture. You also need to know N. If N is always tiny, then even O(2^N) isn't going to be much slower than O(log N). You need to know N, the constant, and other factors, such as ease of implementation.
In the sorted list under random inserts case, a red black tree is O(log N) for an insert and O(log N) for a sort, an array O(N) for an insert. But code complexity is far far greater than for the array solution, and the constant is pretty high. You probably won't see any speed advantage at all until about N = 100. Then up to N = 1000 or so, the O(N^2) method will still come back in under just noticeable time. Sometimes a process which takes a tenth of a millisecond is a thousand times better than one which takes a tenth of a second, sometimes it merely saves a tenth of a second. Which is the case depends on how your function is being used.
But as N gets big, the naive method becomes increasingly unviable, so we need to abandon simple code and get the big O complexity down.

**rcgldr** · 05-18-2013

Originally Posted by phantomotap

The approach recommended by rcgldr may be the correct one; he directly referenced the conditions on the size and number of elements. (We don't know, but that's not relevant to this post.)

The other condition for the array to be a good choice, is if the great majority of operations are search operations with relatively few insert operations. The array makes most sense if all the data is available from the start, in which case it's sorted at the start, and afterwards the sorted data is just searched (used like a dictionary or spelling checker). The key issue is we don't know what the original poster needs, so all of this is speculation. Perhaps the original poster just wants a linked list sort as a learning exercise, as opposed to trying to solve a particular problem.

Thread: Linked List Sorting

Thread Tools

Search Thread

Display

Linked List Sorting

Similar Threads

Sorting a linked list

Linked List Sorting

Sorting a linked list

sorting linked list

Sorting a linked list