Thread: Binary Search Tree sort

  1. #1
    Registered User
    Join Date
    Dec 2017
    Posts
    24

    Binary Search Tree sort

    Hey, I'm working on a BST Sort and I don't know why this code is not running the way I want. Actually, I have no compilation errors neither warnings, it's just doesn't match with the attended result

    Code:
    #include <stdio.h>
    #include<stdlib.h>
    
    
    struct Node{
        int value;
        struct Node *left, *right;
    };
    
    
    struct Node *newNode(int value){
        struct Node *node = malloc(sizeof *node);
        node->value = value;
        node->left = NULL;
        node->right = NULL;
        return node;
    }
    
    
    struct Node *insert(struct Node *node, int value){
        if(node == NULL)
             return newNode(value);
        if(value < node->value)
            node->left = insert(node->left, value);
        else if (value > node->value)
            node->right = insert(node->right, value);
        return node;
    }
    
    
    void collect(struct Node *root, int T[], int i){
        if(root != NULL){
            collect(root->left, T, i);
            T[i++] = root->value;
            collect(root->right, T, i);
        }
    }
    
    
    void binary_tree_sort(int T[], int n){
        struct Node *root = NULL;
        root = insert(root, T[0]);
        int i;
        for(i = 1; i < n; i++)
            insert(root, T[i]);
        i = 0;
        collect(root, T, i);
    }
    
    
    int main(void){
        int TAB[] = {47, 39, 1, 45, 11};
        int n = sizeof(TAB)/sizeof(TAB[0]);
        binary_tree_sort(TAB, n);
        int i;
        for(i = 0; i < n; i++)
            printf("%d ", TAB[i]);
        return 0;
    }
    It should have returned 1 11 39 45 47
    However it returns the original array 47 39 1 45 11

    I must say that I'm a bit confused
    Last edited by ElPosto; 12-27-2017 at 06:23 PM.

  2. #2
    Registered User
    Join Date
    Dec 2017
    Posts
    632
    You're not updating your index in the collect function.
    Pass a pointer.
    Code:
    void collect(struct Node *root, int T[], int *i){
        if(root != NULL){
            collect(root->left, T, i);
            T[(*i)++] = root->value;
            collect(root->right, T, i);
        }
    }
    
    void binary_tree_sort(int T[], int n){
        struct Node *root = NULL;
        int i;
        for(i = 0; i < n; i++)
            root = insert(root, T[i]);
        i = 0;
        collect(root, T, &i);
    }
    Or use the return value:

    Code:
    int collect(struct Node *root, int T[], int i){
        if(root != NULL){
            i = collect(root->left, T, i);
            T[i] = root->value;
            i = collect(root->right, T, i + 1);
        }
        return i;
    }
    
    void binary_tree_sort(int T[], int n){
        struct Node *root = NULL;
        int i;
        for(i = 0; i < n; i++)
            root = insert(root, T[i]);
        collect(root, T, 0);
    }
    Last edited by john.c; 12-27-2017 at 07:19 PM.
    The world hangs on a thin thread, and that is the psyche of man. - Carl Jung

  3. #3
    Registered User
    Join Date
    Dec 2017
    Posts
    24
    Quote Originally Posted by john.c View Post
    You're not updating your index in the collect function.
    Pass a pointer.
    Code:
    void collect(struct Node *root, int T[], int *i){
        if(root != NULL){
            collect(root->left, T, i);
            T[(*i)++] = root->value;
            collect(root->right, T, i);
        }
    }
    
    void binary_tree_sort(int T[], int n){
        struct Node *root = NULL;
        int i;
        for(i = 0; i < n; i++)
            root = insert(root, T[i]);
        i = 0;
        collect(root, T, &i);
    }
    Or use the return value:

    Code:
    int collect(struct Node *root, int T[], int i){
        if(root != NULL){
            i = collect(root->left, T, i);
            T[i] = root->value;
            i = collect(root->right, T, i + 1);
        }
        return i;
    }
    
    void binary_tree_sort(int T[], int n){
        struct Node *root = NULL;
        int i;
        for(i = 0; i < n; i++)
            root = insert(root, T[i]);
        collect(root, T, 0);
    }
    Ok thank you very much, you saved me one more time !! By the way, when same values in the given array, the program puts one of these two at the end of the array. The problem comes from the insert function, because it doesn't treat equal values at all. I simply fixed it this way :

    Code:
    struct Node *insert(struct Node *node, int value){
        if(node == NULL)
             return newNode(value);
        if(value <= node->value)
            node->gauche = insert(node->left, value);
        else if (value > node->value)
            node->right = insert(node->right, value);
        return node;
    }
    Last edited by ElPosto; 12-29-2017 at 12:06 PM.

  4. #4
    Registered User
    Join Date
    Dec 2017
    Posts
    632
    Quote Originally Posted by ElPosto View Post
    Ok thank you very much, you saved me one more time !!
    ElPosto! The Postman! Postin' away!
    No prob, man.
    The world hangs on a thin thread, and that is the psyche of man. - Carl Jung

  5. #5
    Registered User
    Join Date
    Dec 2017
    Posts
    24


    Actually, I thought I've fixed the problem of same values in the given array but now when I run the program for size of 200,000 elements, it takes more than 13 seconds to treat the array !! (although it's considered one of the quickest sort). At least the result is correct but...


    Code:
    struct Node *insert(struct Node *node, int value){
        if(node == NULL)
             return newNode(value);
        if(value <= node->value)
            node->left = insert(node->left, value);
        else if (value > node->value)
            node->right = insert(node->right, value);
        return node;
    }
    Whereas with last writing it took 0.6 secs to run but the result wasn't correct, it pushed some values at the end of the array, as if we have inserted too much elements in the array. For example with a = [26 48 25 0 97 46 69 16 77 80 16 90 63 0 7] it returned --> 0 7 16 25 26 46 48 63 69 77 80 90 97 0 7

    Code:
    struct Node *insert(struct Node *node, int value){
       if(node == NULL)
            return newNode(value);
       if(value < node->value)
           node->left = insert(node->left, value);
       else if (value > node->value)
           node->right = insert(node->right, value);
       return node;
    }


    So I'm a bit confused and don't have any clue to explain this big time difference and to have the correct function with the minimum time. I think the problem comes from the line in the binary tree sort function :

    Code:
    for(i = 0; i < n; i++)
            insert(root, T[i]);
    Even when adopting your writing, problems remain with odd sizes (N = 15) for example where a value is pushed at the end of the array

    Code:
    void binary_tree_sort(int T[], int n){
        struct Node *root = NULL;
        int i;
        for(i = 0; i < n; i++)
            root = insert(root, T[i]);
        collect(root, T, 0);
    }
    
    
    So.. I must admit It's quite disturbing. I tried to "run" the program on a sheet of paper to make it function mentally but I haven't any clue to explain this problem of time difference either to solve it.

    I hoped you could help me one more (last, I promess ) time...

    Here is the complete source code:

    Code:
    #include <stdio.h>
    #include<stdlib.h>
    #define N 15
    
    
    typedef int ARRAY[N];
    
    
    void display_arr(ARRAY A, int n){
        int i;
        for(i = 0; i < n; i++)
            printf("%d ", A[i]);
            printf("\n\n");
    }
    
    
    struct Node{
        int value;
        struct Node *left, *right;
    };
    
    
    struct Node *newNode(int value){
        struct Node *node = malloc(sizeof *node);
        node->value = value;
        node->left = NULL;
        node->right = NULL;
        return node;
    }
    
    
    struct Node *insert(struct Node *node, int value){
        if(node == NULL)
             return newNode(value);
        if(value < node->value)
            node->left = insert(node->left, value);
        else if (value > node->value)
            node->right = insert(node->right, value);
        return node;
    }
    
    
    voidcollect(structNode *root, ARRAY A, int *i){
        if(root != NULL){
            collect(root->left, A, i);
            A[(*i)++] = root->value;
            collect(root->right, A, i);
        }
    }
    
    
    voidbinary_tree_sort(ARRAY A, intn){
        struct Node *root = NULL;
        root = insert(root, A[0]);
        int i;
        for(i = 1; i < n; i++)
            insert(root, A[i]);
        i = 0;
        collect(root, A, &i);
    }
     
    int main(void){
        ARRAY A;
        int r;
        srand(time(0));
        for(r = 0; r < N; r++)
            A[r] = ((unsigned int)rand()%100);
        printf("Array to sort :  ");
        display_arr(A, N);
        binary_tree_sort(A, N);
        printf("Array sorted :  ");
        display_arr(A, N);
        return 0;
    }
    
    Last edited by ElPosto; 12-30-2017 at 11:42 AM.

  6. #6
    Registered User
    Join Date
    Dec 2017
    Posts
    632
    Your insert function ignores duplicates. Try:
    Code:
    struct Node *insert(struct Node *node, int value){
        if(node == NULL)
             return newNode(value);
        if (value < node->value)
            node->left = insert(node->left, value);
        else
            node->right = insert(node->right, value);
        return node;
    }
    As for the time taken, I don't see why this sort would be considered particularly quick. To sort an array of 200000 elements, it would need to call malloc 200000 times!
    The world hangs on a thin thread, and that is the psyche of man. - Carl Jung

  7. #7
    Registered User
    Join Date
    Dec 2017
    Posts
    24
    Quote Originally Posted by john.c View Post
    Your insert function ignores duplicates. Try:
    Code:
    struct Node *insert(struct Node *node, int value){
        if(node == NULL)
             return newNode(value);
        if (value < node->value)
            node->left = insert(node->left, value);
        else
            node->right = insert(node->right, value);
        return node;
    }
    As for the time taken, I don't see why this sort would be considered particularly quick. To sort an array of 200000 elements, it would need to call malloc 200000 times!
    But why adding equality condition decreases time treatment so far ?

    With 200,000 elements and without
    Code:
    if (value <= node->value)
            node->left = insert(node->left, value);
    but
    Code:
    if (value < node->value)
            node->left = insert(node->left, value);
    it takes 0,6 secs to reply !

    This sort is in O(nlog(n)), same as the quicksort but the quicksort I've implemented doesn't treat an array in 13 secs !
    Last edited by ElPosto; 12-30-2017 at 01:38 PM.

  8. #8
    Registered User
    Join Date
    Dec 2017
    Posts
    632
    If you are inserting 200000 random numbers between 0 and 99 then there are going to be a lot of duplicates! If you are ignoring the duplicates, you are only inserting the values 0 to 99 to your tree. Basically, you're only sorting an array of 100 elements. Although the tree still needs to be searched to find the first duplicate for all 200000 inputs, it will find a duplicate much sooner then it will find the correct place to insert the element.

    You could try just using rand() instead of rand() % 100 so there won't be many duplicates.

    At any rate, tree sort has a BEST CASE complexity of nlogn, but a worst case of n-squared (for an unbalanced tree). You need to use a balanced tree (e.g., red-black tree) to get the best case.
    Last edited by john.c; 12-30-2017 at 01:43 PM.
    The world hangs on a thin thread, and that is the psyche of man. - Carl Jung

  9. #9
    Registered User
    Join Date
    Dec 2017
    Posts
    24
    Quote Originally Posted by john.c View Post
    If you are inserting 200000 random numbers between 0 and 99 then there are going to be a lot of duplicates! If you are ignoring the duplicates, you are only inserting the values 0 to 99 to your tree. Basically, you're only sorting an array of 100 elements. Although the tree still needs to be searched to find the first duplicate for all 200000 inputs, it will find a duplicate much sooner then it will find the correct place to insert the element.

    You could try just using rand() instead of rand() % 100 so there won't be many duplicates.

    At any rate, tree sort has a BEST CASE complexity of nlogn, but a worst case of n-squared (for an unbalanced tree). You need to use a balanced tree (e.g., red-black tree) to get the best case.
    I am a complete idiot, I forget to pull out rand %100. As you say it, it doesn't surprises me now that it takes so much time ! !

    +Thank you very much for all your pieces of information ! We see that you know what you're talking about and it is very pleasant !

  10. #10
    Registered User
    Join Date
    Apr 2013
    Posts
    1,646
    If the intermediate state of the tree is not needed until all nodes are inserted, I'm thinking it could be faster to create a degenerate tree ("vines" - only using the "right" pointers), effectively using it as a linked list and performing a bottom up merge sort for linked list which uses a small internal array of lists. The nodes are merged into the array one at a time, then the array is merged to form a single sorted list, which could then be converted into a balanced binary search tree (vines to tree). Wiki article for bottom up merge sort for linked list:

    Merge sort - Wikipedia
    Last edited by rcgldr; 12-30-2017 at 06:17 PM.

  11. #11
    Registered User
    Join Date
    Dec 2017
    Posts
    24
    Actually, the BST sort works pretty well (average of 0.644 to sort a 500,000 elements array) but I noticed that all the sorting algorithms I have implemented have a limit (probably memory). For example, for :


    • the bubble sort
    • the sequently-insertion sort
    • the dichotomic insertion sort
    • the selection sort
    • the BST sort


    If I treat arrays of more than 500,000 elements, the program crashes

    For the quicksort, it is 400,000 elements, when my merge sort and Heapsort can't do over 200,000 elements. I mean every time I run the program with number of elements going over these limits, it stops and I've the message "my_sort.exe stops to run"

    It's quite disturbing because I have to test values until 1,000,000 elements but I think it's a problem of software environment & system. I'm working with Code::Blocks on Windows

    Do you have any solution to solve this problem ?

    Here are my dichotomic insertion sort and merge sort for example, for a treatment of an only array (with chosen size) :

    Dichotomic insertion sort :

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #define N 15  /*The value I can't make the program run over 500,000*/
    
    
    typedef int ARRAY[N];
    
    
    void display_arr(ARRAY A, int n){
        int i;
        for(i = 0; i < n; i++)
            printf("%d ", A[i]);
        printf("\n\n");
    }
    
    
    int position(ARRAY A, int i){
        int left = 0, right = i-1, middle;
        while (left <= right){
            middle = (left + right) / 2;
            if (A[i] > A[middle])
                left = middle + 1;
            else right = middle - 1;
        }
        return left;
    }
    
    
    void translation(ARRAY A, int p, int i){
        int j;
        for(j = i-1; p <= j; j--)
            A[j+1] = A[j];
    }
    
    
    void dic_insertion_sort(ARRAY A, int n){
        int i, p, x;
        for(i = 1; i < n; i++){
            p = position(A, i);
            x = A[i];
            translation(A, p, i);
            A[p] = x;
        }
    }
    
    
    int main(void){
        ARRAY A;
        int r;
        srand(time(0));
        for(r = 0; r < N; r++)
            A[r] = (unsigned int)rand();
        printf("Array to sort :  ");
        display_arr(A, N);
        dic_insertion_sort(A, N);
        printf("Array sorted :  ");
        display_arr(A, N);
        return 0;
    }
    Merge sort :

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #define N 15  /*The value I can't make the program go over 200,000
    
    
    typedef int ARRAY[N];
    
    
    void display_arr(ARRAY A, int n){
        int i;
        for(i = 0; i < n; i++)
            printf("%d ", A[i]);
        printf("\n\n");
    }
    
    
    void merge(ARRAY A, int left, int middle, int right){
        ARRAY TEMP;
        int i = left, j = middle + 1, k = 0;
        while(i <= middle && j <= right){
            if(A[i] <= A[j])
                TEMP[k++] = A[i++];
            else
                TEMP[k++] = A[j++];
        }
        while(i <= middle)
            TEMP[k++] = A[i++];
        while(j <= right)
            TEMP[k++] = A[j++];
        int c;
        for(c = 0; c < right-left+1; c++)
            A[left+c] = TEMP[c];
    }
    
    
    void merge_sort_gen(ARRAY A, int left, int right){
        int middle;
        if(left < right){
            middle = (left + right) / 2;
            merge_sort_gen(A, left, middle);
            merge_sort_gen(A, middle + 1, right);
            merge(A, left, middle, right);
        }
    }
    
    
    void merge_sort(ARRAY A, int n){
        merge_sort_gen(A, 0, n-1);
    }
    
    
    int main(void){
        ARRAY A;
        int r;
        srand(time(0));
        for(r = 0; r < N; r++)
            A[r] = (unsigned int)rand();
        printf("Array to sort :  ");
        display_arr(A, N);
        merge_sort(A, N);
        printf("Array sorted :  ");
        display_arr(A, N);
        return 0;
    }
    Last edited by ElPosto; 12-31-2017 at 10:14 AM.

  12. #12
    Registered User
    Join Date
    May 2010
    Posts
    4,450
    If I treat arrays of more than 500,000 elements, the program crashes
    You're probably running out of stack space since the array is declared inside a function. You will probably need to use dynamic memory or arrays that have static storage durations.

    By the way I recommend you get rid of that typedef (typedef int ARRAY[N]) and say what you mean when you mean it.

  13. #13
    Registered User
    Join Date
    Dec 2017
    Posts
    24
    Quote Originally Posted by jimblumberg View Post
    You're probably running out of stack space since the array is declared inside a function. You will probably need to use dynamic memory or arrays that have static storage durations.

    By the way I recommend you get rid of that typedef (typedef int ARRAY[N]) and say what you mean when you mean it.
    I just followed the instructions given to me, even if I know it's completely stupid using some closed structures with such sizes of data. Can you show me of to do that, with the examples I've given below ? I'm not used to use dynamic variables

  14. #14
    Registered User
    Join Date
    Dec 2017
    Posts
    632
    The easiest way to allow the allocation of a large object is to allocate the large object statically.
    Code:
    int main(void) {
        static ARRAY a;
    Dynamic allocation is better, though, since it allows you to let the user choose the size. Get rid of the ARRAY typedef and just accept the array as an int* in functions.
    Code:
    int main(int argc, char **argv) {
        if (argc != 2) {
            fprintf(stderr, "Usage: treesort ARRAYSIZE\n");
            exit(EXIT_FAILURE);
        }
    
        int size = atoi(argv[1]);
        if (size < 1) {
            fprintf(stderr, "Error: ARRAYSIZE must be > 0\n");
            exit(EXIT_FAILURE);
        }
    
        int *a = malloc(size * sizeof *a);
        if (a == NULL) {
            perror("Error: malloc");
            exit(EXIT_FAILURE);
        }
    
        // ...
    
        free(a);
        return 0;
    }
    The world hangs on a thin thread, and that is the psyche of man. - Carl Jung

  15. #15
    Registered User
    Join Date
    May 2010
    Posts
    4,450
    I just followed the instructions given to me
    What instructions? IMO, that typedef is horrible and all it really does is aid in confusion.

    Can you show me of to do that, with the examples I've given below ?
    What examples are you talking about, I don't see anything "below"?

    The following is probably a good reason using that typdef is confusing:

    Code:
    void merge(ARRAY A, int left, int middle, int right){
        ARRAY TEMP;
    ...
    What is the size of the TEMP array?

    Why is it that size?



    If you don't understand dynamic memory then perhaps you shouldn't be using such large numbers for your array sizes.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Binary Search Tree-search method help?
    By shocklightning in forum C++ Programming
    Replies: 5
    Last Post: 03-25-2012, 10:57 PM
  2. binary search tree
    By angel34 in forum C Programming
    Replies: 0
    Last Post: 12-10-2010, 03:32 PM
  3. Replies: 0
    Last Post: 11-04-2006, 11:07 AM
  4. A Binary Search Tree of... Binary Search Trees...
    By SlyMaelstrom in forum C++ Programming
    Replies: 5
    Last Post: 12-10-2005, 02:12 PM
  5. Search Engine - Binary Search Tree
    By Gecko2099 in forum C Programming
    Replies: 9
    Last Post: 04-17-2005, 02:56 PM

Tags for this Thread