Thread: Quicksort help!

  1. #1
    Registered User
    Join Date
    Feb 2012
    Posts
    7

    Quicksort help!

    Hi all,

    I have written a quicksort algo that works fine on a array of ints. My problem however is that I will be sorting a few millions structs based on its int member.

    My task has to split the few millions structs into sections of 750. So I will have 750 structs then list[1] will be another 750 structs and then list[2] will have another 750 all the way to a few million. So my question is how can I sort the array when it is structured in this seperated way, or can I just create the structs in one large array say struct MyList mine[4000000] and then input the sorted data into the seperated sections.

    I hope this is clear, thanks.

    A bit more explanation, I would have a structure like:

    list[10]
    section[6]
    struct obj->int value to sort by

    So in this case there would be 10 entries of 6 structs. So 60 obj structs all together that need to be sorted by their int value. However in real scale there would be maybe 3-4 million obj structs, so I dont think I can do struct obj list[3000000] and sort that, I would have to sort it in its seperated form.
    Last edited by zeronero; 02-17-2012 at 04:08 AM.

  2. #2
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Ok, how do you actually want it sorted? Use this example:

    1 2 3 4 5 6 7 8

    You have eight numbers, in random order. You have 2 buckets (not 750). Now how do they end up?

    1 2 3 4
    5 6 7 1

    1 3 5 7
    2 4 6 8

    Or what?

    How do you decide which ones go in what bucket? Do they all arrive to you at the same time? Do you have all the data at once? Does it trickle in, or what? You need to think about all of that, before you actually worry about writing the sorting code.


    Quzah.
    Hope is the first step on the road to disappointment.

  3. #3
    Registered User
    Join Date
    Feb 2012
    Posts
    7
    Quote Originally Posted by quzah View Post
    Ok, how do you actually want it sorted? Use this example:

    1 2 3 4 5 6 7 8

    You have eight numbers, in random order. You have 2 buckets (not 750). Now how do they end up?

    1 2 3 4
    5 6 7 1

    1 3 5 7
    2 4 6 8

    Or what?

    How do you decide which ones go in what bucket? Do they all arrive to you at the same time? Do you have all the data at once? Does it trickle in, or what? You need to think about all of that, before you actually worry about writing the sorting code.


    Quzah.
    Hi, I will read the numbers from a file and load them all into the structure, they will be in total random order so I want to sort them sequentially so it would become:

    1234 - bucket 1
    5678 - bucket 2
    etc...

    At the beginning it will be random so:

    5634 - bucket 1
    2187 - bucket 2

  4. #4
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    The question is whether the multiple separate lists are alright being sorted each on their own independently, or whether you want all 3000000 structs in order. You haven't stated that clearly.

  5. #5
    Registered User
    Join Date
    Feb 2012
    Posts
    7
    Quote Originally Posted by nonoob View Post
    The question is whether the multiple separate lists are alright being sorted each on their own independently, or whether you want all 3000000 structs in order. You haven't stated that clearly.
    Hi Yes I would like all 3000000 structs in order.

  6. #6
    Registered User
    Join Date
    Feb 2012
    Posts
    7
    Like at the moment I am trying to get it to work with an array of 3 buckets of 4 objects. I am getting lost in it thought it doesn't seem to be picking the right elements to swap at all!

  7. #7
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Code:
    #define BUCKETS 750
    #define BINS 4000
    #define INVALIDBINORBUCKET (BUCKETS*BINS)
    #define WHATBUCKET(x) ((x)/BINS)
    #define WHATBIN(x) ((x)%BINS)
    ...
    type put( type**data, type object, type where )
    {
        if( where > 0 && where < INVALIDBINORBUCKET )
        {
            type bin = WHATBIN( where );
            type bucket = WHATBUCKET( where );
            data[ bucket ][ bin ] = object;
            return 0;
        }
        return INVALIDBINORBUCKET;
    }
    This assumes you're making a dynamic 2D array of objects. You could have just as easily went 1D.

    You may want to place things where you think they should be as you read them in. Take some value, decide where that value should roughly fall, and stick it in that bin/bucket. Then later run through and sort the whole thing (I guess).


    Quzah.
    Hope is the first step on the road to disappointment.

  8. #8
    Registered User
    Join Date
    Feb 2012
    Posts
    7
    Quote Originally Posted by quzah View Post
    Code:
    #define BUCKETS 750
    #define BINS 4000
    #define INVALIDBINORBUCKET (BUCKETS*BINS)
    #define WHATBUCKET(x) ((x)/BINS)
    #define WHATBIN(x) ((x)%BINS)
    ...
    type put( type**data, type object, type where )
    {
        if( where > 0 && where < INVALIDBINORBUCKET )
        {
            type bin = WHATBIN( where );
            type bucket = WHATBUCKET( where );
            data[ bucket ][ bin ] = object;
            return 0;
        }
        return INVALIDBINORBUCKET;
    }
    This assumes you're making a dynamic 2D array of objects. You could have just as easily went 1D.

    You may want to place things where you think they should be as you read them in. Take some value, decide where that value should roughly fall, and stick it in that bin/bucket. Then later run through and sort the whole thing (I guess).


    Quzah.
    Hi Quzah,

    Thanks for the response. Your code does not do any sorting though, I am already at the stage where I have all the data in the buckets but it is in random order. So I know need to sort it, normally in quicksort I just keep track of the left and right markers but the fact there are buckets and bins makes this a lot harder which is the bit I cannot figure out?

  9. #9
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Why are you splitting your input into buckets?

    Does the bucketing have to be done before the sort (if so, why)?

    Do you have to use quicksort?

    What is the range of the int you're sorting on?

    How big is your struct?
    Last edited by oogabooga; 02-17-2012 at 11:30 AM.

  10. #10
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    If you want to sort each bucket separately then go ahead. If you want the entire collection of 3000000 entries sorted you'll need to merge the lists. Seems simple.

  11. #11
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Okay so you have an array of linked-lists (all of which happen to be the same length) and you want to sort is such that of all the n elements, the k lowest are in the first list and the next k lowest are in the next list etc. Clearly items do not necessarily end up in the bucket they started in.

    What this all says to me is that you should do this via the following three steps:
    Step 1: Join all of the linked-lists together
    Step 2: Use a linked-list implementation of a sorting algorithm
    Step 3: Cut the list up into lists of the original size and store each head back into the original array.

    So, I hate to break it to you, but your array-based implementation of quicksort is not going to be of much use to you here.

    It looks to me like the only reason for having the array of lists at all as opposed to keeping it as one list afterwards, is that you can perform lookups faster on this structure by doing a binary search based on the head items in the array. This gives you O(log(n/k) + k) lookup time, where k is the list length and n is the total number of items. I would just pick a better data structure than a list to begin with.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  12. #12
    Registered User
    Join Date
    Feb 2012
    Posts
    7
    Quote Originally Posted by iMalc View Post
    Okay so you have an array of linked-lists (all of which happen to be the same length) and you want to sort is such that of all the n elements, the k lowest are in the first list and the next k lowest are in the next list etc. Clearly items do not necessarily end up in the bucket they started in.

    What this all says to me is that you should do this via the following three steps:
    Step 1: Join all of the linked-lists together
    Step 2: Use a linked-list implementation of a sorting algorithm
    Step 3: Cut the list up into lists of the original size and store each head back into the original array.

    So, I hate to break it to you, but your array-based implementation of quicksort is not going to be of much use to you here.

    It looks to me like the only reason for having the array of lists at all as opposed to keeping it as one list afterwards, is that you can perform lookups faster on this structure by doing a binary search based on the head items in the array. This gives you O(log(n/k) + k) lookup time, where k is the list length and n is the total number of items. I would just pick a better data structure than a list to begin with.
    Hi you are right, I am doing it this way so I can search the structure quickly once it is all ordered.

  13. #13
    Registered User
    Join Date
    Feb 2012
    Posts
    7
    Quote Originally Posted by oogabooga View Post
    Why are you splitting your input into buckets?

    Does the bucketing have to be done before the sort (if so, why)?

    Do you have to use quicksort?

    What is the range of the int you're sorting on?

    How big is your struct?
    No it doesn't have to be done before the bucketing and infact seem easier to do before! but how can I do that if i have millions of structs as struct obj arrayname[3000000] just seg faults when I run it?

  14. #14
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    You didn't say how big your struct is.

    If your problem is that you're declaring the array locally and it won't fit on the stack, you could try declaring it globally.

  15. #15
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by zeronero View Post
    No it doesn't have to be done before the bucketing and infact seem easier to do before! but how can I do that if i have millions of structs as struct obj arrayname[3000000] just seg faults when I run it?
    If you are doing what you had stated earlier, then you have no need for such a large array. Why do you now want a large array instead of a smaller array of linked-lists?

    Or rather, once you make your mind up about how you plan to store the data, please ask a specific question on what part you are having trouble with. It's harder to help a moving target.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. quicksort
    By thescratchy in forum C Programming
    Replies: 3
    Last Post: 03-15-2010, 12:44 PM
  2. Help with quicksort
    By c++prog in forum C++ Programming
    Replies: 12
    Last Post: 11-24-2009, 11:01 PM
  3. quicksort
    By WelshGrandSlam in forum C++ Programming
    Replies: 8
    Last Post: 01-23-2008, 08:15 PM
  4. Quicksort again
    By rvanbeusichem in forum C Programming
    Replies: 3
    Last Post: 11-25-2004, 04:32 AM
  5. quicksort
    By Unregistered in forum C++ Programming
    Replies: 2
    Last Post: 01-12-2002, 02:07 AM