Thread: best way to sort large amount of data.

  1. #1
    Registered User
    Join Date
    Jul 2013
    Location
    Germany
    Posts
    499

    best way to sort large amount of data.

    I have been learning about bubble sorting, binary trees, quick sort and a few other ways to to sort data using the Big O notation. While I wrote the code for a bubble sort I thought I would generate the worst case scenario. I inputed 100,000 values into a vector in descending order and sort them, and naturally outputted the values. It took 20 whole seconds to complete and I used the code I seen in the book.

    However I decided to use the code I have always been using when I wanted to sort values.
    Code:
    struct mySort
    {
     bool operator (int a, int b) {return a<b;}
    }Sort_Greater;
    then I would use
    Code:
    std::sort(myvector.begin,myvector.end,Sort_greater);
    This was done in 2 seconds with the same vector with 1,000,000,000 values, I was blown away. What algorithm is sort using? I looked on cplusplus.com and it basically told me how to use it.

    Is there more ways to sort out there as simple as this?

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Sorting algorithm - Wikipedia, the free encyclopedia

    sort - C++ Reference
    Most of the STL comes with a "complexity guarantee".
    Which in the case of std::sort means N*log2(N).

    There are several algorithms which meet this requirement.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    In the case of HP / Microsoft's STL implementation, std::sort is a quick sort and std::stable_sort is a merge sort (unless element count is very small, then it's insertion sort).

  4. #4
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by jocdrew21 View Post
    Code:
    struct mySort
    {
     bool operator (int a, int b) {return a<b;}
    }Sort_Greater;
    then I would use
    Code:
    std::sort(myvector.begin,myvector.end,Sort_greater);
    Btw, for types that can work with the <,> operators, you can just use the function objects from <functional>.
    In this case, your code would look like
    Code:
    std::sort(myvector.begin(),myvector.end(),std::less);
    (...due to it being the default behaviour, you can also omit the 3rd argument)

  5. #5
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Also, given that you want to use some complex sorting criteria, you can also do
    Code:
    std::sort(myvector.begin(), myvector.end(), [](T left, T right) -> bool { /* your sorting critera here */ });
    Where T is the type of the elements in your vector. This requires C++11 support in your compiler.
    If using a C++14 compliant compiler (e.g. clang), you can simplify it to
    Code:
    std::sort(myvector.begin(), myvector.end(), [](auto left, auto right) { /* your sorting critera here */ });
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  6. #6
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by Elysia View Post
    If using a C++14 compliant compiler
    Not specifying the return type seems to work fine with C++11.
    Why? (GCC extension..perhaps??)

  7. #7
    Registered User
    Join Date
    Jul 2013
    Location
    Germany
    Posts
    499
    @macasij7479 thank you, I didn't know that.

    @rcgldr you answered my question I was thinking when I read the article on cplusplus.com, thanks for clear it up.

    @Salem thanks for the reference...

    What do you guys think is the best way to sort lets say 10 million ints in random order?

  8. #8
    Registered User
    Join Date
    Jul 2013
    Location
    Germany
    Posts
    499
    @Elysia, I am assuming that is lambda right? I am very interested in how you just coded that. It looks very nice!!!!

  9. #9
    Registered User
    Join Date
    Jul 2013
    Location
    Germany
    Posts
    499
    Code:
    std::sort(Myvector.begin(), Myvector.end(), [](auto left, auto right) { return left < right;});
    Did I use this right because I get an error. Maybe Xcode doesn't support C++14

    Code:
    std::sort(Myvector.begin(), Myvector.end(), [](int left, int right) { return left < right;});
    this works fine.... WOW thanks so much!!!!!

  10. #10
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by manasij7479 View Post
    Not specifying the return type seems to work fine with C++11.
    Why? (GCC extension..perhaps??)
    It does, but only if you have a simple return statement in the lambda body. That's why I added the return type.
    If you have a more complex body, you must specify return type. This restriction was lifted from C++14, as long as all return statements have the same return type.

    Quote Originally Posted by jocdrew21 View Post
    @Elysia, I am assuming that is lambda right? I am very interested in how you just coded that. It looks very nice!!!!
    Yes, it's called a lambda.
    Last edited by Elysia; 02-08-2014 at 12:34 PM.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  11. #11
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Elysia
    If using a C++14 compliant compiler (e.g. clang)
    Technically, there is no such thing as a C++14 compliant compiler at this time
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  12. #12
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by laserlight View Post
    Technically, there is no such thing as a C++14 compliant compiler at this time
    Well, you have a point, I guess. Draft C++14 compliant may be a better term, then
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  13. #13
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by Elysia View Post
    Well, you have a point, I guess. Draft C++14 compliant may be a better term, then
    That's an oxymoron - compliance is relative to an accepted standard or set of criteria, the word draft means not yet accepted and subject to change.

    I'd suggest a "compiler that provides features from the C++14 draft" would be an appropriate description.

    Yes, there was a number of compiler vendors who described their products as ANSI compliant before the first C++ standard was ratified by ANSI (and later ISO). The truth is the first casualty when marketeers get involved in claims of compliance.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  14. #14
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    Quote Originally Posted by jocdrew21 View Post
    What do you guys think is the best way to sort lets say 10 million ints in random order?
    The fastest way would be Radix sort. Example code below. pData is a pointer to the array, pTemp is a pointer to a temp array of the same size. UI64 is a 64 bit unsigned integer, PUI64 is a pointer to a 64 bit unsigned integer. In this case RadixSort() always takes 8 passes so the sorted data always ends up in pData and RadixSort() will always return a pointer to pData. For 32 bit ints, the index matrix size would be [4][256], and only 4 passes made.

    Code:
    typedef unsigned long long UI64;
    typedef unsigned long long *PUI64;
    
    PUI64 RadixSort(PUI64 pData, PUI64 pTemp, size_t count)
    {
    size_t mIndex[8][256] = {0};            /* index matrix */
    PUI64 pDst, pSrc, pTmp;
    size_t i,j,m,n;
    UI64 u;
    
        for(i = 0; i < count; i++){         /* generate histograms */
            u = pData[i];
            for(j = 0; j < 8; j++){
                mIndex[j][(size_t)(u & 0xff)]++;
                u >>= 8;
            }       
        }
        for(j = 0; j < 8; j++){             /* convert to indices */
            n = 0;
            for(i = 0; i < 256; i++){
                m = mIndex[j][i];
                mIndex[j][i] = n;
                n += m;
            }       
        }
    
        pDst = pTemp;                       /* radix sort */
        pSrc = pData;
        for(j = 0; j < 8; j++){
            for(i = 0; i < count; i++){
                u = pSrc[i];
                m = (size_t)(u >> (j<<3)) & 0xff;
                pDst[mIndex[j][m]++] = u;
            }
            pTmp = pSrc;
            pSrc = pDst;
            pDst = pTmp;
        }
    
        return(pSrc);
    }
    Last edited by rcgldr; 02-08-2014 at 07:18 PM.

  15. #15
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by jocdrew21 View Post
    What do you guys think is the best way to sort lets say 10 million ints in random order?
    std::sort is fine. No need to invest anything unless you can prove the performance isn't sufficient.
    Also works on any type as long as you can compare them using operator < (as opposed to, say, radix sort).
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. [C++] Sending variable amount of data over Winsock
    By netik in forum C++ Programming
    Replies: 3
    Last Post: 10-31-2013, 05:15 AM
  2. Help managing large amounts of data and generating data from it
    By jakethehake in forum C++ Programming
    Replies: 0
    Last Post: 12-01-2009, 06:59 AM
  3. Copying constant amount of data
    By TriKri in forum C++ Programming
    Replies: 16
    Last Post: 07-12-2008, 06:32 AM
  4. [Large file][Value too large for defined data type]
    By salsan in forum Linux Programming
    Replies: 11
    Last Post: 02-05-2008, 04:18 AM
  5. Read large amount of numbers from file
    By h3ro in forum C++ Programming
    Replies: 7
    Last Post: 05-10-2007, 09:33 AM