# Thread: Binary Search Tree sort

1. ## Binary Search Tree sort

Hey, I'm working on a BST Sort and I don't know why this code is not running the way I want. Actually, I have no compilation errors neither warnings, it's just doesn't match with the attended result

Code:
```#include <stdio.h>
#include<stdlib.h>

struct Node{
int value;
struct Node *left, *right;
};

struct Node *newNode(int value){
struct Node *node = malloc(sizeof *node);
node->value = value;
node->left = NULL;
node->right = NULL;
return node;
}

struct Node *insert(struct Node *node, int value){
if(node == NULL)
return newNode(value);
if(value < node->value)
node->left = insert(node->left, value);
else if (value > node->value)
node->right = insert(node->right, value);
return node;
}

void collect(struct Node *root, int T[], int i){
if(root != NULL){
collect(root->left, T, i);
T[i++] = root->value;
collect(root->right, T, i);
}
}

void binary_tree_sort(int T[], int n){
struct Node *root = NULL;
root = insert(root, T);
int i;
for(i = 1; i < n; i++)
insert(root, T[i]);
i = 0;
collect(root, T, i);
}

int main(void){
int TAB[] = {47, 39, 1, 45, 11};
int n = sizeof(TAB)/sizeof(TAB);
binary_tree_sort(TAB, n);
int i;
for(i = 0; i < n; i++)
printf("%d ", TAB[i]);
return 0;
}```
It should have returned 1 11 39 45 47
However it returns the original array 47 39 1 45 11

I must say that I'm a bit confused  2. You're not updating your index in the collect function.
Pass a pointer.
Code:
```void collect(struct Node *root, int T[], int *i){
if(root != NULL){
collect(root->left, T, i);
T[(*i)++] = root->value;
collect(root->right, T, i);
}
}

void binary_tree_sort(int T[], int n){
struct Node *root = NULL;
int i;
for(i = 0; i < n; i++)
root = insert(root, T[i]);
i = 0;
collect(root, T, &i);
}```
Or use the return value:

Code:
```int collect(struct Node *root, int T[], int i){
if(root != NULL){
i = collect(root->left, T, i);
T[i] = root->value;
i = collect(root->right, T, i + 1);
}
return i;
}

void binary_tree_sort(int T[], int n){
struct Node *root = NULL;
int i;
for(i = 0; i < n; i++)
root = insert(root, T[i]);
collect(root, T, 0);
}``` 3. Originally Posted by john.c You're not updating your index in the collect function.
Pass a pointer.
Code:
```void collect(struct Node *root, int T[], int *i){
if(root != NULL){
collect(root->left, T, i);
T[(*i)++] = root->value;
collect(root->right, T, i);
}
}

void binary_tree_sort(int T[], int n){
struct Node *root = NULL;
int i;
for(i = 0; i < n; i++)
root = insert(root, T[i]);
i = 0;
collect(root, T, &i);
}```
Or use the return value:

Code:
```int collect(struct Node *root, int T[], int i){
if(root != NULL){
i = collect(root->left, T, i);
T[i] = root->value;
i = collect(root->right, T, i + 1);
}
return i;
}

void binary_tree_sort(int T[], int n){
struct Node *root = NULL;
int i;
for(i = 0; i < n; i++)
root = insert(root, T[i]);
collect(root, T, 0);
}```
Ok thank you very much, you saved me one more time !! By the way, when same values in the given array, the program puts one of these two at the end of the array. The problem comes from the insert function, because it doesn't treat equal values at all. I simply fixed it this way :

Code:
```struct Node *insert(struct Node *node, int value){
if(node == NULL)
return newNode(value);
if(value <= node->value)
node->gauche = insert(node->left, value);
else if (value > node->value)
node->right = insert(node->right, value);
return node;
}``` 4. Originally Posted by ElPosto Ok thank you very much, you saved me one more time !!
ElPosto! The Postman! Postin' away!
No prob, man.  5. Actually, I thought I've fixed the problem of same values in the given array but now when I run the program for size of 200,000 elements, it takes more than 13 seconds to treat the array !! (although it's considered one of the quickest sort). At least the result is correct but...

Code:
```struct Node *insert(struct Node *node, int value){
if(node == NULL)
return newNode(value);
if(value <= node->value)
node->left = insert(node->left, value);
else if (value > node->value)
node->right = insert(node->right, value);
return node;
}```
Whereas with last writing it took 0.6 secs to run but the result wasn't correct, it pushed some values at the end of the array, as if we have inserted too much elements in the array. For example with a = [26 48 25 0 97 46 69 16 77 80 16 90 63 0 7] it returned --> 0 7 16 25 26 46 48 63 69 77 80 90 97 0 7

Code:
```struct Node *insert(struct Node *node, int value){
if(node == NULL)
return newNode(value);
if(value < node->value)
node->left = insert(node->left, value);
else if (value > node->value)
node->right = insert(node->right, value);
return node;
}```

So I'm a bit confused and don't have any clue to explain this big time difference and to have the correct function with the minimum time. I think the problem comes from the line in the binary tree sort function :

Code:
```for(i = 0; i < n; i++)
insert(root, T[i]);```
Even when adopting your writing, problems remain with odd sizes (N = 15) for example where a value is pushed at the end of the array

Code:
```void binary_tree_sort(int T[], int n){
struct Node *root = NULL;
int i;
for(i = 0; i < n; i++)
root = insert(root, T[i]);
collect(root, T, 0);
}

```
So.. I must admit It's quite disturbing. I tried to "run" the program on a sheet of paper to make it function mentally but I haven't any clue to explain this problem of time difference either to solve it.

I hoped you could help me one more (last, I promess ) time...

Here is the complete source code:

Code:
```#include <stdio.h>
#include<stdlib.h>
#define N 15

typedef int ARRAY[N];

void display_arr(ARRAY A, int n){
int i;
for(i = 0; i < n; i++)
printf("%d ", A[i]);
printf("\n\n");
}

struct Node{
int value;
struct Node *left, *right;
};

struct Node *newNode(int value){
struct Node *node = malloc(sizeof *node);
node->value = value;
node->left = NULL;
node->right = NULL;
return node;
}

struct Node *insert(struct Node *node, int value){
if(node == NULL)
return newNode(value);
if(value < node->value)
node->left = insert(node->left, value);
else if (value > node->value)
node->right = insert(node->right, value);
return node;
}

voidcollect(structNode *root, ARRAY A, int *i){
if(root != NULL){
collect(root->left, A, i);
A[(*i)++] = root->value;
collect(root->right, A, i);
}
}

voidbinary_tree_sort(ARRAY A, intn){
struct Node *root = NULL;
root = insert(root, A);
int i;
for(i = 1; i < n; i++)
insert(root, A[i]);
i = 0;
collect(root, A, &i);
}

int main(void){
ARRAY A;
int r;
srand(time(0));
for(r = 0; r < N; r++)
A[r] = ((unsigned int)rand()%100);
printf("Array to sort :  ");
display_arr(A, N);
binary_tree_sort(A, N);
printf("Array sorted :  ");
display_arr(A, N);
return 0;
}
``` 6. Your insert function ignores duplicates. Try:
Code:
```struct Node *insert(struct Node *node, int value){
if(node == NULL)
return newNode(value);
if (value < node->value)
node->left = insert(node->left, value);
else
node->right = insert(node->right, value);
return node;
}```
As for the time taken, I don't see why this sort would be considered particularly quick. To sort an array of 200000 elements, it would need to call malloc 200000 times! 7. Originally Posted by john.c Your insert function ignores duplicates. Try:
Code:
```struct Node *insert(struct Node *node, int value){
if(node == NULL)
return newNode(value);
if (value < node->value)
node->left = insert(node->left, value);
else
node->right = insert(node->right, value);
return node;
}```
As for the time taken, I don't see why this sort would be considered particularly quick. To sort an array of 200000 elements, it would need to call malloc 200000 times!
But why adding equality condition decreases time treatment so far ?

With 200,000 elements and without
Code:
```if (value <= node->value)
node->left = insert(node->left, value);```
but
Code:
```if (value < node->value)
node->left = insert(node->left, value);```
it takes 0,6 secs to reply !

This sort is in O(nlog(n)), same as the quicksort but the quicksort I've implemented doesn't treat an array in 13 secs ! 8. If you are inserting 200000 random numbers between 0 and 99 then there are going to be a lot of duplicates! If you are ignoring the duplicates, you are only inserting the values 0 to 99 to your tree. Basically, you're only sorting an array of 100 elements. Although the tree still needs to be searched to find the first duplicate for all 200000 inputs, it will find a duplicate much sooner then it will find the correct place to insert the element.

You could try just using rand() instead of rand() % 100 so there won't be many duplicates.

At any rate, tree sort has a BEST CASE complexity of nlogn, but a worst case of n-squared (for an unbalanced tree). You need to use a balanced tree (e.g., red-black tree) to get the best case. 9. Originally Posted by john.c If you are inserting 200000 random numbers between 0 and 99 then there are going to be a lot of duplicates! If you are ignoring the duplicates, you are only inserting the values 0 to 99 to your tree. Basically, you're only sorting an array of 100 elements. Although the tree still needs to be searched to find the first duplicate for all 200000 inputs, it will find a duplicate much sooner then it will find the correct place to insert the element.

You could try just using rand() instead of rand() % 100 so there won't be many duplicates.

At any rate, tree sort has a BEST CASE complexity of nlogn, but a worst case of n-squared (for an unbalanced tree). You need to use a balanced tree (e.g., red-black tree) to get the best case.
I am a complete idiot, I forget to pull out rand %100. As you say it, it doesn't surprises me now that it takes so much time ! !

+Thank you very much for all your pieces of information ! We see that you know what you're talking about and it is very pleasant ! 10. If the intermediate state of the tree is not needed until all nodes are inserted, I'm thinking it could be faster to create a degenerate tree ("vines" - only using the "right" pointers), effectively using it as a linked list and performing a bottom up merge sort for linked list which uses a small internal array of lists. The nodes are merged into the array one at a time, then the array is merged to form a single sorted list, which could then be converted into a balanced binary search tree (vines to tree). Wiki article for bottom up merge sort for linked list:

Merge sort - Wikipedia 11. Actually, the BST sort works pretty well (average of 0.644 to sort a 500,000 elements array) but I noticed that all the sorting algorithms I have implemented have a limit (probably memory). For example, for :

• the bubble sort
• the sequently-insertion sort
• the dichotomic insertion sort
• the selection sort
• the BST sort

If I treat arrays of more than 500,000 elements, the program crashes

For the quicksort, it is 400,000 elements, when my merge sort and Heapsort can't do over 200,000 elements. I mean every time I run the program with number of elements going over these limits, it stops and I've the message "my_sort.exe stops to run"

It's quite disturbing because I have to test values until 1,000,000 elements but I think it's a problem of software environment & system. I'm working with Code::Blocks on Windows

Do you have any solution to solve this problem ?

Here are my dichotomic insertion sort and merge sort for example, for a treatment of an only array (with chosen size) :

Dichotomic insertion sort :

Code:
```#include <stdio.h>
#include <stdlib.h>
#define N 15  /*The value I can't make the program run over 500,000*/

typedef int ARRAY[N];

void display_arr(ARRAY A, int n){
int i;
for(i = 0; i < n; i++)
printf("%d ", A[i]);
printf("\n\n");
}

int position(ARRAY A, int i){
int left = 0, right = i-1, middle;
while (left <= right){
middle = (left + right) / 2;
if (A[i] > A[middle])
left = middle + 1;
else right = middle - 1;
}
return left;
}

void translation(ARRAY A, int p, int i){
int j;
for(j = i-1; p <= j; j--)
A[j+1] = A[j];
}

void dic_insertion_sort(ARRAY A, int n){
int i, p, x;
for(i = 1; i < n; i++){
p = position(A, i);
x = A[i];
translation(A, p, i);
A[p] = x;
}
}

int main(void){
ARRAY A;
int r;
srand(time(0));
for(r = 0; r < N; r++)
A[r] = (unsigned int)rand();
printf("Array to sort :  ");
display_arr(A, N);
dic_insertion_sort(A, N);
printf("Array sorted :  ");
display_arr(A, N);
return 0;
}```
Merge sort :

Code:
```#include <stdio.h>
#include <stdlib.h>
#define N 15  /*The value I can't make the program go over 200,000

typedef int ARRAY[N];

void display_arr(ARRAY A, int n){
int i;
for(i = 0; i < n; i++)
printf("%d ", A[i]);
printf("\n\n");
}

void merge(ARRAY A, int left, int middle, int right){
ARRAY TEMP;
int i = left, j = middle + 1, k = 0;
while(i <= middle && j <= right){
if(A[i] <= A[j])
TEMP[k++] = A[i++];
else
TEMP[k++] = A[j++];
}
while(i <= middle)
TEMP[k++] = A[i++];
while(j <= right)
TEMP[k++] = A[j++];
int c;
for(c = 0; c < right-left+1; c++)
A[left+c] = TEMP[c];
}

void merge_sort_gen(ARRAY A, int left, int right){
int middle;
if(left < right){
middle = (left + right) / 2;
merge_sort_gen(A, left, middle);
merge_sort_gen(A, middle + 1, right);
merge(A, left, middle, right);
}
}

void merge_sort(ARRAY A, int n){
merge_sort_gen(A, 0, n-1);
}

int main(void){
ARRAY A;
int r;
srand(time(0));
for(r = 0; r < N; r++)
A[r] = (unsigned int)rand();
printf("Array to sort :  ");
display_arr(A, N);
merge_sort(A, N);
printf("Array sorted :  ");
display_arr(A, N);
return 0;
}``` 12. If I treat arrays of more than 500,000 elements, the program crashes
You're probably running out of stack space since the array is declared inside a function. You will probably need to use dynamic memory or arrays that have static storage durations.

By the way I recommend you get rid of that typedef (typedef int ARRAY[N]) and say what you mean when you mean it. 13. Originally Posted by jimblumberg You're probably running out of stack space since the array is declared inside a function. You will probably need to use dynamic memory or arrays that have static storage durations.

By the way I recommend you get rid of that typedef (typedef int ARRAY[N]) and say what you mean when you mean it.
I just followed the instructions given to me, even if I know it's completely stupid using some closed structures with such sizes of data. Can you show me of to do that, with the examples I've given below ? I'm not used to use dynamic variables  14. The easiest way to allow the allocation of a large object is to allocate the large object statically.
Code:
```int main(void) {
static ARRAY a;```
Dynamic allocation is better, though, since it allows you to let the user choose the size. Get rid of the ARRAY typedef and just accept the array as an int* in functions.
Code:
```int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stderr, "Usage: treesort ARRAYSIZE\n");
exit(EXIT_FAILURE);
}

int size = atoi(argv);
if (size < 1) {
fprintf(stderr, "Error: ARRAYSIZE must be > 0\n");
exit(EXIT_FAILURE);
}

int *a = malloc(size * sizeof *a);
if (a == NULL) {
perror("Error: malloc");
exit(EXIT_FAILURE);
}

// ...

free(a);
return 0;
}``` 15. I just followed the instructions given to me
What instructions? IMO, that typedef is horrible and all it really does is aid in confusion.

Can you show me of to do that, with the examples I've given below ?
What examples are you talking about, I don't see anything "below"?

The following is probably a good reason using that typdef is confusing:

Code:
```void merge(ARRAY A, int left, int middle, int right){
ARRAY TEMP;
...```
What is the size of the TEMP array?

Why is it that size?

If you don't understand dynamic memory then perhaps you shouldn't be using such large numbers for your array sizes. Popular pages Recent additions int, node, return, struct, value; 