Better algorithm for this ? (generating combinations)

This is a discussion on Better algorithm for this ? (generating combinations) within the C++ Programming forums, part of the General Programming Boards category; The problem is to generate all possible unique selections of n numbers from a given set of size > n. ...

  1. #1
    Registered User manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    Kolkata@India
    Posts
    2,498

    Better algorithm for this ? (generating combinations)

    The problem is to generate all possible unique selections of n numbers from a given set of size > n.
    I've pretty much done it by brute force, as the number of final results should be same whichever way I choose. If a better way is not possible, I'd like suggestions about simple optimizations.
    Code:
    std::deque<std::set<int>> find_combinations(const std::set<int>& U,unsigned int n)
    {
        std::deque<std::set<int>> result;
        if(n<1 || U.size()<n)//Nothing Happens Here !
            {}
        else if(n==1) // The Universal input set is broken up and redistributed.
        {
            for(auto x:U)
                result.push_back(std::set<int>({x}));
        }
        else // Recursion FTW, no comments !!
        {
            std::set<int> temp(U);
            for(auto x:U)
            {
                temp.erase(x);
                auto partial_result = find_combinations(temp,n-1);
                for(auto& y:partial_result)
                    y.insert(x);
                result.insert
                (
                    std::begin(result),
                    std::begin(partial_result),
                    std::end(partial_result)
                );
            }
        }
        return result;
    }
    I noticed that having the return container a deque makes this much faster than if a vector or list is used. Why ?
    Also, is it worth the trouble trying to do it iteratively?
    Finally, can anyone help me determining the complexity of this function ?
    (I think it is something near (log base n U.size() )^ n) but I'm probably very very wrong about that.)

    Btw, here is a main you can use to test it.
    Code:
    int main(int argc,char** argv)
    {
        std::set<int> input;
        for(int i=0;i<atoi(argv[1]);i++)
            input.insert(i);
        
        auto foo = find_combinations(input,atoi(argv[2]));
        
    //     for(auto x:foo) //displaying results
    //     {
    //         for(auto y:x)
    //             std::cout<<y<<'\t';
    //         std::cout<<std::endl;
    //     }
    }
    Last edited by manasij7479; 05-10-2012 at 04:38 AM.
    Manasij Mukherjee | gcc-4.8.2 @Arch Linux
    Slow and Steady wins the race... if and only if :
    1.None of the other participants are fast and steady.
    2.The fast and unsteady suddenly falls asleep while running !



  2. #2
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    4,169
    Also, is it worth the trouble trying to do it iteratively?
    Iterative processing almost always wins over collection processing, but what happens behind the scenes isn't relevant to a client.

    It is only worth it if you are in a situation where you need to preserve stack space or coding for a standard that prevents recursion of any type.

    Soma

  3. #3
    Registered User manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    Kolkata@India
    Posts
    2,498
    Quote Originally Posted by phantomotap View Post
    It is only worth it if you are in a situation where you need to preserve stack space or coding for a standard that prevents recursion of any type.
    With move constructors in work here, I'd say that the stack issue is absent.
    (Other than making the temp set in line#13, but it'd be same if done with loops.)
    Manasij Mukherjee | gcc-4.8.2 @Arch Linux
    Slow and Steady wins the race... if and only if :
    1.None of the other participants are fast and steady.
    2.The fast and unsteady suddenly falls asleep while running !



  4. #4
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    4,169
    With move constructors in work here, I'd say that the stack issue is absent.
    Move constructors have nothing to do with stack space unless the allocator storage lives on the stack; the data for the `std::set<???>' and `std::deque<???>' live on the heap.

    The overhead for saving registers is probably larger than the cost of the few local variables.

    The metric for stack preservation in industry standards is targeted more at limiting stack depth than the size of any given frame.

    In this case, the stack is linear with `n'. The size of any individual frame wouldn't even be considered at a lot of places; this recursion simply would not be allowed because it is linear.

    I've had to argue for binary recursions that are practicably limited to a depth of only 19. I've lost every single time.

    Soma

  5. #5
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,046
    Returning an std::deque<std::set<int>> looks expensive to me; I'd pass in the std::deque as a reference parameter and add results to it there. Also, it may be cheaper to pass along a single std::set<int> &U, where each recursive call removes items from U and then adds them back in as necessary. It depends on how expensive lookups and removals are (and other containers may be better in this regard, although you'd have to be careful to make the container stable -- add back to same position removed from). Maybe I'm too much of a Prolog programmer, but it makes way more sense to me to maintain data structures that are updated at each recursive call than making copies all the time. Also, this becomes much easier to convert to an iterative version if you like.

    So here's how I would approach the problem. Sorry there aren't many comments in the code, ask me to elaborate if any of it is confusing.
    Code:
    #include <iostream>
    #include <set>
    #include <vector>
    #include <deque>
    
    std::deque<std::set<int>> find_combinations(const std::set<int>& U,unsigned int n)
    {
        std::deque<std::set<int>> result;
        if(n<1 || U.size()<n)//Nothing Happens Here !
            {}
        else if(n==1) // The Universal input set is broken up and redistributed.
        {
            for(auto x:U)
                result.push_back(std::set<int>({x}));
        }
        else // Recursion FTW, no comments !!
        {
            std::set<int> temp(U);
            for(auto x:U)
            {
                temp.erase(x);
                auto partial_result = find_combinations(temp,n-1);
                for(auto& y:partial_result)
                    y.insert(x);
                result.insert
                (
                    std::begin(result),
                    std::begin(partial_result),
                    std::end(partial_result)
                );
            }
        }
        return result;
    }
    
    /** Generates combinations of @a data with @a n elements. Results will be added
        to @a results, which should have @c data.size() choose @a n elements after
        this function returns.
        
        @a chosen and @a start indicate partial solutions, and should initially be
        empty and zero, respectively.
    */
    void find_combinations_dwk(std::deque<std::vector<int> > &results,
        std::vector<int> &data, std::vector<int> &chosen,
        unsigned start, unsigned n) {
        
        if(n <= 0) {
            results.push_front(chosen);
            return;
        }
        
        for(std::vector<int>::size_type x = start; x < data.size(); x ++) {
            int value = data[x];
            data.erase(data.begin() + x);
            
            chosen.push_back(value);
            find_combinations_dwk(results, data, chosen, x, n - 1);
            chosen.pop_back();
            
            data.insert(data.begin() + x, value);
        }
    }
    
    int main(int argc, char *argv[]) {
        std::set<int> data;
        std::vector<int> dataVector;
        for(int x = 0; x < 20; x ++) {
            data.insert(x);
            dataVector.push_back(x);
        }
        
        if(argc == 1) {
            auto size = find_combinations(data, 8).size();
            std::cout << "find_combinations found: " << size << std::endl;
        }
        else {
            std::deque<std::vector<int> > results;
            std::vector<int> chosen;
            find_combinations_dwk(results, dataVector, chosen, 0, 8);
            
            std::cout << "find_combinations_dwk found: "
                << results.size() << std::endl;
        }
        
        return 0;
    }
    Timing results:
    Code:
    $ g++ --std=c++0x choose.cpp -o choose
    $ time ./choose
    find_combinations found: 125970
    
    real	0m3.624s
    user	0m3.576s
    sys	0m0.044s
    $ time ./choose dwk
    find_combinations_dwk found: 125970
    
    real	0m0.320s
    user	0m0.300s
    sys	0m0.020s
    $
    So it looks like my method is roughly an order of magnitude faster, at least for this input size! Of course it's not exactly a fair comparison, as I needed to use std::vector to avoid invalidated iterators while your method uses std::set, but I think you'll find that coding in this style can often be more efficient.

    If you wanted to make my version iterative, it wouldn't be too hard. Just notice the only part that relies on recursion: really, the values of x. So all you'd need would be an std::stack of values of x, put the whole function into a loop and push and pop values of x as necessary. At least in theory.

    Cheers, dwk.

    [edit] Note I didn't actually check the output, but because the container has 20 choose 8 elements, I'm pretty sure it's correct. Or close enough. [/edit]
    Last edited by dwks; 05-10-2012 at 03:45 PM.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  6. #6
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,294
    Quote Originally Posted by dwks View Post
    Maybe I'm too much of a Prolog programmer ...
    Cool. It's good to find someone else who also likes Prolog. Whilst I haven't used it in a few years, I've still got some of the games I made with it lying around, and still think of messing with it from time to time.

    Nice use of C++11 there manasij7479!
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  7. #7
    3735928559
    Join Date
    Mar 2008
    Location
    RTP
    Posts
    839
    Code:
    CombinationMatrix::CombinationMatrix(unsigned int numVars, unsigned int numStates):
    	numVars(numVars),
    	numStates(numStates),
            Matrix(pow((double)numStates, (double)numVars), numVars)//(rows,columns)
    {
        unsigned int i, j;
    	std::vector<unsigned int> row(columns);
        for(i=1;i<rows;++i)
        {
            baseConvert(row, i, numStates);
            for(j=0;j<columns;++j)
            {
                (*this)(i,j)=row[j];
            }
    	row.assign(row.size(),0);
        }
    }
    
    //...
    
    static void baseConvert(std::vector<unsigned int>& result, unsigned int value,unsigned int base)
    {
        unsigned int digits = (unsigned int)1+log((double)value)/log((double)base);
        for(int i=0;i<digits-1;i++)
        {
            result[i]=value%base;
            value/=base;
        }
        result[digits-1]=value;
    }
    if i understand you correctly, for your case, i believe the number of states == number of variables.
    Last edited by m37h0d; 05-11-2012 at 07:25 AM.

  8. #8
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,046
    Hmm, that's an interesting idea, m37h0d. There's not quite enough code there for me to figure it out at a quick glance but I think generating bit-patterns and using them as a kind of shadow-array reference is a good idea. Certainly you'd save on memory if you were generating permutations of more complex objects.

    Although, if you're going to go that route there are bit-manipulation algorithms that generate the next n-bit number with k one bits, which is exactly what we need. That's probably the way to go, either chaining together real ints or just implementing this stuff in your own bignum kind of library. I imagine it would be faster than calling log() all the time. Bit Twiddling Hacks

    @iMalc: I haven't used Prolog much in a while either, but it definitely trains you to think in terms of recursion and which lists you're adding stuff to and which lists you're removing stuff from... It's a good language to be familiar with, I think.

    I echo the sentiment about C++11: nice job, manasij7479.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  9. #9
    Registered User manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    Kolkata@India
    Posts
    2,498
    Quote Originally Posted by phantomotap View Post
    Move constructors have nothing to do with stack space unless the allocator storage lives on the stack; the data for the `std::set<???>' and `std::deque<???>' live on the heap.
    Sorry, I was confused about something else regarding move semantics here... and I always forget that the standard allocators do not store on the stack.
    Last edited by manasij7479; 05-12-2012 at 04:15 AM.
    Manasij Mukherjee | gcc-4.8.2 @Arch Linux
    Slow and Steady wins the race... if and only if :
    1.None of the other participants are fast and steady.
    2.The fast and unsteady suddenly falls asleep while running !



  10. #10
    Registered User manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    Kolkata@India
    Posts
    2,498
    Quote Originally Posted by dwks View Post
    Also, it may be cheaper to pass along a single std::set<int> &U, where each recursive call removes items from U and then adds them back in as necessary.
    Neat idea.. doesn't seem to work very well for sets, though. (where, it should do better, iirc)

    It took some effort for me to understand your code. (... and I'm nowhere near 'seeing' what m37h0d's is doing; yet.)
    Thanks.
    Manasij Mukherjee | gcc-4.8.2 @Arch Linux
    Slow and Steady wins the race... if and only if :
    1.None of the other participants are fast and steady.
    2.The fast and unsteady suddenly falls asleep while running !



  11. #11
    Registered User manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    Kolkata@India
    Posts
    2,498
    Quote Originally Posted by iMalc View Post
    Nice use of C++11 there manasij7479!
    Quote Originally Posted by dwks View Post
    I echo the sentiment about C++11: nice job, manasij7479.
    C++11 makes it remarkably easy to write naive code in an elegant way.
    This was a nice example of that.
    Manasij Mukherjee | gcc-4.8.2 @Arch Linux
    Slow and Steady wins the race... if and only if :
    1.None of the other participants are fast and steady.
    2.The fast and unsteady suddenly falls asleep while running !



  12. #12
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    4,169
    C++11 makes it remarkably easy to write naive code in an elegant way.
    O_o

    remarkably easy
    o_O

    That is one for the record books. ^_^

    I can't even decide on which silly thing to say.

    *shrug*

    Okay.


    *leans away from desk lamp*

    *steeples fingers*

    He is coming along nicely...

    Bwahahahah!

    Seriously though, if you are at the point where that flavor of code is "remarkably easy" you are going to be a great programmer.

    Soma

  13. #13
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    4,169
    >_<

    Nope. I'm not happy with it; I should have went with the "Dr. Who" reference.

    Soma

  14. #14
    Registered User manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    Kolkata@India
    Posts
    2,498
    Quote Originally Posted by phantomotap View Post
    remarkably easy
    I taught my cousin (in the 1st week of his C++ experience) basics of using containers and ranged loops.

    If he, being a physics student, could grasp that more easily than using counter variables and C strings, then it is definitely 'remarkably easy' .
    Manasij Mukherjee | gcc-4.8.2 @Arch Linux
    Slow and Steady wins the race... if and only if :
    1.None of the other participants are fast and steady.
    2.The fast and unsteady suddenly falls asleep while running !



Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Generating number combinations
    By litzkrieg in forum C Programming
    Replies: 23
    Last Post: 03-01-2011, 10:25 AM
  2. Trouble Generating Game Algorithm
    By justlivelife15 in forum Game Programming
    Replies: 3
    Last Post: 06-13-2007, 09:58 AM
  3. Combinations
    By Cmuppet in forum C Programming
    Replies: 6
    Last Post: 10-19-2004, 07:39 AM
  4. Combinations
    By GaPe in forum C Programming
    Replies: 16
    Last Post: 01-09-2002, 04:38 AM
  5. Replies: 10
    Last Post: 01-07-2002, 03:03 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21