Like Tree1Likes

How to scan multiple vectors for common elements efficiently?

This is a discussion on How to scan multiple vectors for common elements efficiently? within the C++ Programming forums, part of the General Programming Boards category; It's not clear why counter is a lambda. Either make it a free function (possibly static or namespace ""), or ...

  1. #16
    Registered User
    Join Date
    Apr 2006
    Posts
    2,023
    It's not clear why counter is a lambda. Either make it a free function (possibly static or namespace ""), or put the body in tp.schedule(...). Do which ever is easier to read.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  2. #17
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    1,044
    Good question about why counter is a lambda. It's because I don't feel like it deserves to be its own free function (it would just obfuscate my main.cpp and it doesn't fit into any other .cpp file except maybe my tetra.cpp one) and I honestly only use it once and only once for one purpose. So I figured, heck, why not go ahead and just write it down as a lambda and people can easily read that it's a small sub-function within my main loop.

    Nevertheless, after much experimentation, I finally got something would actually compile and work. Also Elysia, I need those mutex locks otherwise my counting is subject to race conditions. I did some trial and error without the use of mutexes (but using else-if's because it's stylistically superior, imo) and for 22^3 particles, I got 8039 inside and 1378 on the surface for one run and for another I got 7515 inside and 1365 on the surface for another. Neither of those add up to the proper sum but when I use mutexes I get a consistent 9260 inside + 1388 on the surface for a grand total of 10648 which is exactly 22^3 so I mutexes ftw!

    My revised code looks like this :
    Code:
    /* Create and use a lambda function to count the number of particles contained by the root */
    
    
       int *inside = new int, *surface = new int, *outside = new int;
       *inside = 0; *surface = 0; *outside = 0;
    
    
       auto counter = [](int *inside, int *surface, int *outside, vector<particle> parts, tetra *root) {
    
    
          int x = 0;
    
    
          for (auto it = parts.begin(); it < parts.end(); ++it) {
    
    
             x = inside_tetra(root, &(*it));
    
    
             if (x == 0) {
    
    
                ins.lock();
                ++(*inside); 
                ins.unlock();
             }
    
    
             else if (x == -1) {
    
    
                out.lock();
                ++(*outside);
                out.unlock();
             }
    
    
             else if (x > 0) {
    
    
                surf.lock();
                ++(*surface);
                surf.unlock();
             }
          }
    
    
          return;
       };
    
    
       for (unsigned i =0; i < num_threads; ++i)
          tp.schedule([=]() { return counter(inside, surface, outside, particle_sets[i], root_tetra); });
    
    
       tp.wait();
    
    
       cout << "\nThere are " << *inside << " points inside the root" << endl;
       cout << "There are " << *surface << " points on the surface of the root" << endl;
       cout << "There are " << *outside << " points outside the root" << endl;
       cout << *inside + *surface << " total contained points" << endl;
    
    
       delete (inside), delete (surface), delete(outside);
    I suppose though I should be using auto_ptr instead of manual new() and delete() and you're right about the unique lock. Those are small revisions though which I will take into consideration or implement eventually. Heck, I should just do it now.

    Thank you for all the advice though, I really do appreciate it

  3. #18
    C++まいる!Cをこわせ! Elysia's Avatar
    Join Date
    Oct 2007
    Posts
    22,537
    Quote Originally Posted by MutantJohn View Post
    Nevertheless, after much experimentation, I finally got something would actually compile and work. Also Elysia, I need those mutex locks otherwise my counting is subject to race conditions. I did some trial and error without the use of mutexes (but using else-if's because it's stylistically superior, imo) and for 22^3 particles, I got 8039 inside and 1378 on the surface for one run and for another I got 7515 inside and 1365 on the surface for another. Neither of those add up to the proper sum but when I use mutexes I get a consistent 9260 inside + 1388 on the surface for a grand total of 10648 which is exactly 22^3 so I mutexes ftw!
    I didn't say you should get rid of the mutextes. I simply said you shouldn't lock them manually. That is what we have lock guards for.

    I suppose though I should be using auto_ptr instead of manual new() and delete() and you're right about the unique lock.
    unique_ptr
    Though I really fail to see why these ints must be allocated on the heap.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  4. #19
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    1,044
    Alright, maybe you can help me with this then. The code I have hear is totally suffering from race conditions very badly.
    Code:
    array<vector<vector<tree*>>, 2> scan_leaves(vector<tree*> *leaves, pool tp) {
    
    
       cout << "Pool size : " << tp.size() << endl; // this checks out fine
    
    
       array<vector<vector<tree*>>, 2> safe_and_unsafe; // we're sorting out safe vs unsafe for multithreading
    
    
       //array<vector<pair<vector<tree*>, particle*>>, 2> unsafe_safe; This is a prototype for later optimization
    
    
       vector<unsigned> safety_check; // use as numeric array
       safety_check.reserve(num_threads); // reserve base capacity
       safety_check.assign(num_threads, 0); // initialize to all zeroes
    
    
       for (unsigned i=0; i < num_threads; ++i) { // leaves is a pointer to a number of vectors equal to num_threads
    
    
          tp.schedule([=]() { sort(leaves[i].begin(), leaves[i].end()); } ); // this works fine as each set of data is independent
       }
    
    
       tp.wait();
    
    
       mutex lck; //create a mutex
    
    
       for (unsigned i=0; i< num_threads-1; ++i) { // this is where the trouble begins...
    
    
          tp.schedule([&]() { 
    
    
             for (unsigned j= i+1; j < num_threads; ++j) {
    printf("(%u, %u)\n", i, j); // this is supposed to print the following (0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)
                for (auto it = leaves[j].begin(); it < leaves[j].end(); ++it) {
    
    
                   if (binary_search(leaves[i].begin(), leaves[i].end(), *it)) {
    
    
                      lck.lock();
                      safety_check.at(i) += 1;
                      safety_check.at(j) += 1;
                      lck.unlock();
                   }
                }  
             } 
          } );
    
    
       }
    
    
       tp.wait();
    
    
       for (unsigned i=0; i < safety_check.size(); ++i)
          if (safety_check.at(i) > 0)
             safe_and_unsafe.at(1).push_back(leaves[i]);
          else
             safe_and_unsafe.at(0).push_back(leaves[i]);
    
    
       cout << "array<0>.size = " << safe_and_unsafe.at(0).size() << endl;
       cout << "array<1>.size = " << safe_and_unsafe.at(1).size() << endl;
    
    
       for (auto & s : safety_check)
          cout << s << endl;
    
    
       return safe_and_unsafe;
    }
    Basically, it's supposed to work like this, I have 4 vectors as part of leaves, leaves[0], leaves[1], leaves[2] and leaves[3]. As of now, each vector has one element and it's the same. These are pointers so for example, the value for the only element for all 4 is 0x209a9e0 and this matches well with previous data.

    My algorithm is supposed to go like this, do vector 0 and vector 1 have any elements in common? If so, then add 1's to safety_check at index 0 and 1. Do vector 0 and vector 2 have any elements in common? If so, add 1's to safety_check at index 0 and 2. Do vector 0 and vector 3 have any elements in common? If so...

    That's why I said we have a read spread like this :
    (0, 1) (1, 2) (2, 3)
    (0, 2) (1, 3)
    (0, 3)

    When I do this with just one thread, I get the proper spread for safety_check which is 3, 3, 3, 3.

    How do I get that? Simple!
    (0, 1) share so
    safety_check = 1, 1, 0, 0

    (0, 2) share so
    safety_check = 2, 1, 1, 0

    (0, 3) share so
    safety_check = 3, 1, 1, 1

    (1, 2) share so
    safety_check = 3, 2, 2, 1

    (1, 3) share so
    safety_check = 3, 3, 2, 2

    (2, 3) share so
    safety_check = 3, 3, 3, 3

    and that's basically the algorithm so why am I getting slapped in the face by race conditions so hard?

  5. #20
    Registered User
    Join Date
    Apr 2006
    Posts
    2,023
    Quote Originally Posted by MutantJohn View Post
    Good question about why counter is a lambda. It's because I don't feel like it deserves to be its own free function (it would just obfuscate my main.cpp and it doesn't fit into any other .cpp file except maybe my tetra.cpp one) and I honestly only use it once and only once for one purpose. So I figured, heck, why not go ahead and just write it down as a lambda and people can easily read that it's a small sub-function within my main loop.
    That doesn't seem like a good enough reason. Splitting big functions into smaller ones does not clutter cpp files. You've basically got a non-trivial function nested in another function. It's not what C++ programmers are used to. I'd just pull it out and make it static (or empty namespace). You can also give it a longer name, to make it's narrow use clear.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  6. #21
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    1,044
    Alright, that's fair. I do want to keep consistent with how a C++ programmer would code.

    Also, I seem to have fixed my race conditions that I posted earlier. I am slowly becoming a master of the lambdas!!!!

    Code:
    #include "structures.h"
    
    
    void scan_leaves(vector<pair<vector<tree*>, particle*>> & pile_pair, pool & tp) {
    
    
    /* Assert that our references are valid */
    
    
       assert(tp.size() == num_threads);
       assert(pile_pair.size() == num_threads);
    /*
       tree t1(0x0), t2(0x0), t3(0x0);
    
    
       pile_pair.at(0).first.push_back(&t2);
       pile_pair.at(0).first.push_back(&t3);
       pile_pair.at(0).first.push_back(&t1);
    
    
       cout << "This is the original unsorted vector of pile.at(0).first :\n";
       for (auto it = pile_pair.at(0).first.begin(); it < pile_pair.at(0).first.end(); ++it)
          cout << (size_t)*it << endl;
    */
    /* Sort our collected leaves */
    
    
       for (auto it = pile_pair.begin(); it < pile_pair.end(); ++it) { 
    
    
          tp.schedule([=]() { sort((*it).first.begin(), (*it).first.end()); });
       }
    
    
       tp.wait();
    /*
       cout << "\nThis is the new sorted vector of pile.at(0).first :\n";
       for (auto it = pile_pair.at(0).first.begin(); it < pile_pair.at(0).first.end(); ++it)
          cout << (size_t)*it << endl;
    */
    
    
       vector<unsigned> safety_check;
       safety_check.reserve(num_threads);
       safety_check.assign(num_threads, 0);
    
    
       assert(safety_check.size() == num_threads);
    
    
    
    
    
    
       for (vector<pair<vector<tree*>, particle*>>::iterator it = pile_pair.begin(); it < pile_pair.end(); ++it) {
    
    
          tp.schedule([=, &safety_check, &pile_pair]() { 
    
    
             for (vector<pair<vector<tree*>, particle*>>::iterator it2 = it+1; it2 < pile_pair.end(); ++it2) {
    
    
                for (vector<tree*>::iterator it3 = (*it2).first.begin(); it3 < (*it2).first.end(); ++it3) {
    
    
                   if (binary_search((*it).first.begin(), (*it).first.end(), *it3)) {
                      safety_check.at(distance(pile_pair.begin(), it)) += 1;
                      safety_check.at(distance(pile_pair.begin(), it2)) += 1;
                   }
                }
             }
          });
       }
    
    
       tp.wait();
    
    
       for (auto & s : safety_check)
          cout << s << endl;   
    
    
       return;
    }
    And yes, using vector<pair<vector<tree*>, particle*>> is very confusing lol.

  7. #22
    Registered User
    Join Date
    Apr 2006
    Posts
    2,023
    Yeah, that does look confusing. Can't you make some part of that a class? Like turn the pair into a class (or POD struct) that has better names than first and second. And add more convenience functions to the interface.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  8. #23
    Registered User
    Join Date
    Apr 2013
    Posts
    1,273
    I'm wondering that since there's just one memory bus, and usually just one common outer cache, will multiple threads really help much? Also, instead of going through a vector and doing a binary search, once the vectors are sorted, perhaps it would be faster to perform an operation similar to a k-way merge when scanning for duplicates.

  9. #24
    C++まいる!Cをこわせ! Elysia's Avatar
    Join Date
    Oct 2007
    Posts
    22,537
    Why would multiple threads not help (if done right)? The reason we have multicores is because it's faster.
    But I can imagine there will be a lot of false sharing on here.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  10. #25
    Registered User
    Join Date
    Apr 2006
    Posts
    2,023
    Quote Originally Posted by Elysia View Post
    Why would multiple threads not help (if done right)? The reason we have multicores is because it's faster.
    But I can imagine there will be a lot of false sharing on here.
    Because memory bound applications don't benefit from multicore processors. They are bound by memory latency on the memory bus, which is not affected by the number of cores. As rcgldr says, there's only one memory bus. In those cases improving performance means buying faster RAM, not more CPU cores.

    That said, my instinct is that this code is not bound in that way.
    Last edited by King Mir; 11-12-2013 at 10:10 PM.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  11. #26
    C++まいる!Cをこわせ! Elysia's Avatar
    Join Date
    Oct 2007
    Posts
    22,537
    Quote Originally Posted by King Mir View Post
    Because memory bound applications don't benefit from multicore processors. They are bound by memory latency on the memory bus, which is not affected by the number of cores. As rcgldr says, there's only one memory bus. In those cases improving performance means buying faster RAM, not more CPU cores.

    That said, my instinct is that this code is not bound in that way.
    Yes, you are right, but the way rcgldr put it, it sounded more like there is no way to get parallelism unless you have more than one memory bus and more than one outer cache, both of which does not change. There is always one memory bus and two levels of outer cache (at least for now).
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  12. #27
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    4,214
    Also, instead of going through a vector and doing a binary search, once the vectors are sorted, perhaps it would be faster to perform an operation similar to a k-way merge when scanning for duplicates.
    O_o

    I thought about this for a bit, and I like the idea. Assuming you limit "k" reasonably, making cache efficient runs, you'd probably need multiple passes, but that still dramatically reduces the number of comparisons, and likely reduces seeking, and you can easily thread runs if desired by merging the runs after the threaded pass is finished.

    That said, my instinct is that this code is not bound in that way.
    It seems to me as if every "pair" of threads added to the scheduler shares at least two containers. Consider:

    Code:
    // attempting to simplify
    void thread[0]()
    {
        for(auto & c: pile_pair[1].first)
        {
            if(binary_search(pile_pair[0].first.begin(), pile_pair[0].first.end(), c))
            {
                safety_check[0] += 1;
            }
        }
    }
    
    void thread[1]()
    {
        for(auto & c: pile_pair[2].first)
        {
            if(binary_search(pile_pair[1].first.begin(), pile_pair[1].first.end(), c))
            {
                safety_check[1] += 1;
            }
        }
    }
    The situation may not be enough to make threads compete for the bus, but then I don't have enough information to speculate about the size and complexity of the data.

    Of course, giving each thread its own storage (`vector<unsigned> * safety_check(new [pile_pair.size()]);') might eliminate enough of the potential competition in any event.

    Soma
    “Often out of periods of losing come the greatest strivings toward a new winning streak.” -- Fred Rogers
    “Salem Was Wrong!” -- Pedant Necromancer

  13. #28
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    1,044
    Seeing these responses makes me happy

Page 2 of 2 FirstFirst 12
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. How to scan in multiple values from a text file
    By GenghisAhn in forum C Programming
    Replies: 12
    Last Post: 10-30-2012, 02:52 AM
  2. Using/ Searching through Multiple Vectors
    By bengreenwood in forum C++ Programming
    Replies: 10
    Last Post: 08-03-2009, 08:35 AM
  3. Multiple vectors
    By dpp in forum C++ Programming
    Replies: 5
    Last Post: 01-16-2009, 05:20 PM
  4. How to handle multiple cases which use common code?
    By tmaxx in forum C Programming
    Replies: 3
    Last Post: 10-03-2008, 07:42 AM
  5. scan multiple items in multidimensional array
    By requiem in forum C Programming
    Replies: 1
    Last Post: 04-17-2003, 03:02 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21