Thread: Using OpenMP and STL vectors

  1. #1
    Registered User
    Join Date
    Aug 2009
    Posts
    140

    Using OpenMP and STL vectors

    Hi

    I have a program, where I
    Code:
    push_back
    some numbers during a for-loop, as in

    Code:
    for(int i=0; i<1000000000; i++)
    {
        if(i%10 == 0) num_vec.push_back(i);
    }
    I wanted to exploit multithreading, and I thought about using OpenMP. Appending data to the vector is a critical part, I guess, so is it correct for me to parallelize my program as follows?

    Code:
    #pragma omp parallel for
    for(int i=0; i<1000000000; i++)
    {
        #pragma omp critical
        if(i%10 == 0) num_vec.push_back(i);
    }
    Thanks for help in advance.

    Best,
    Niles.

  2. #2
    Registered User
    Join Date
    Aug 2003
    Posts
    1,218
    Why do you want multithreading for this?

    Your loop can be made into:
    Code:
    const unsigned long numElements = 1000000000;
    num_vec.resize(numElements/10);    // Wrote the number like that for clarity
    for(int i=0; i<numElements; i+=10)
    {
        num_vec.push_back(i);
    }
    And i doubt your multithreaded version is better than my single-threaded version above.

    And this loop should be much easier for openmp to execute in parallel (it closely mimics an example they have on wikipedia: OpenMP - Wikipedia, the free encyclopedia) and shouldn't require any critical sections.

  3. #3
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    with that critical line, I suspect that you're going to spend a lot of time waiting for mutexes. filling a vector in the way you describe is (usually) best done by a single thread. it's likely that a multi-threaded program for this specific example would be slower than a single thread.

  4. #4
    Registered User
    Join Date
    Aug 2009
    Posts
    140
    Thanks for the suggestions, both of you. OK, so I will simply insert values instead of appending. But I have come across a different problem (but still OpenMP-related), which is when I use an if-conditional. Here is my code, which merely generates (using boost) initial positions/velocities and propagates the particles (very simple...). If the particle is within a desired range, I want to keep track of it:

    Code:
    #include <iostream>
    #include <boost/random.hpp>
    #include <boost/random/normal_distribution.hpp>
    
    using namespace std;
    
    
    
    int propagate(double X, double Y, double Z, double vX, double vY, double vZ, double time)
    {
        X += vX*time;
        Y += vY*time;
    
        if(sqrt(X*X + Y*Y) < 0.01)
        {
            return 1;
        }
        return 0;
    }
    
    
    
    
    
    
    int main()
    {
        boost::mt19937 engine(static_cast<unsigned int>(time(0)));
        boost::normal_distribution<double> nd(0.0, 1.0);
        boost::variate_generator< boost::mt19937, boost::normal_distribution<double> > normal_std_one(engine, nd);
    
    
    
    
        double coordX, coordY, coordZ, time;
        double velX, velY, velZ;
        int test;
        int app=0;
    
        int i;
        #pragma omp parallel for
        for(i=0; i<2000000; i++)
        {
            coordX = 0.01*normal_std_one();
            coordY = 0.01*normal_std_one();
            coordZ = 0.0;
    
            velX = normal_std_one();
            velY = normal_std_one();
            velZ = 20.0*normal_std_one()+300;
    
    
    
            time = 1.0/velZ;
            test = propagate(coordX, coordY, coordZ,
                             velX,   velY,   velZ, time);
    
            if(test==1)
            {
    
                app++;
            }
        }
        cout << app << endl;
        return 0;
    
    }
    If I comment out the parallelization, I get (on average) a different value than if I keep the parallelization. How come that is?
    Last edited by Niels_M; 04-08-2013 at 09:35 AM.

  5. #5
    Registered User
    Join Date
    Aug 2003
    Posts
    1,218
    Well all variables you are accessing within the loop are declared before the loop (and before the parallelization), but you don't guard the access. Which means you might well use coordX from one thread, coordY from another and coordZ from a third in all your calculations.

    You absolutely must ensure all access to memory shared between threads are guarded. In this case however an easy fix would be to declare all variables except for app inside the loop, and guard app when you write to it.

    Edit:
    Saw a GIGANTIC flaw in my code above, this is the corrected version, posted here because I can not edit my original post:
    Code:
    const unsigned long numElements = 1000000000;
    num_vec.resize(numElements/10);    // Wrote the number like that for clarity
    for(unsigned long i=0; i<numElements; i+=10)
    {
        num_vec[i/10] = i;    // forgot to change push_back to assignment!
    }
    Last edited by Shakti; 04-08-2013 at 11:07 AM.

  6. #6
    Registered User
    Join Date
    Aug 2009
    Posts
    140
    Quote Originally Posted by Shakti View Post
    Edit:
    Saw a GIGANTIC flaw in my code above, this is the corrected version, posted here because I can not edit my original post:
    Code:
    const unsigned long numElements = 1000000000;
    num_vec.resize(numElements/10);    // Wrote the number like that for clarity
    for(unsigned long i=0; i<numElements; i+=10)
    {
        num_vec[i/10] = i;    // forgot to change push_back to assignment!
    }
    I noted it when you posted to begin with, but thanks!




    Quote Originally Posted by Shakti View Post
    Well all variables you are accessing within the loop are declared before the loop (and before the parallelization), but you don't guard the access. Which means you might well use coordX from one thread, coordY from another and coordZ from a third in all your calculations.

    You absolutely must ensure all access to memory shared between threads are guarded. In this case however an easy fix would be to declare all variables except for app inside the loop, and guard app when you write to it.
    OK, so I guess what I need to do is to

    1) declare velocityX, velocityY, velocityZ, dsdfistanceX, distanceY, distanceZ as private
    2) declare app as reduction(+:app)
    3) declare test as private

    Applying these changes still gives me a minor discrepancy (on average) between the parallelized and unparallelized version.

    Thanks for your help so far, I really appreciate it.

  7. #7
    Registered User
    Join Date
    Aug 2003
    Posts
    1,218
    Well the variate_generator is not thread safe. You are accessing the same generator from 2 threads, so that needs protection as well.

  8. #8
    Registered User
    Join Date
    Aug 2009
    Posts
    140
    Quote Originally Posted by Shakti View Post
    Well the variate_generator is not thread safe. You are accessing the same generator from 2 threads, so that needs protection as well.
    Why would that be necessary? I mean, it is just a random number generator, would that make any difference?

    Nonetheless, I tried to do what you suggest, but none of the following three versions could compile

    Code:
    private(normal_std_one(engine, nd))
    private(normal_std_one())
    private(normal_std_one)

  9. #9
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    private is a keyword, not a function.

    Code:
    class foo
    {
    private:
      int somePrivateData;
    public:
      int somePublicData;
    };

  10. #10
    Registered User
    Join Date
    Aug 2009
    Posts
    140
    Quote Originally Posted by Elkvis View Post
    private is a keyword, not a function.
    I meant is as in


    Code:
    #pragma omp parallel for private(normal_std_one(engine, nd))
    #pragma omp parallel for private(normal_std_one())
    #pragma omp parallel for private(normal_std_one)

  11. #11
    Registered User
    Join Date
    Aug 2009
    Posts
    140
    OK, so I tried adding

    Code:
    #pragma omp critical
    which compiles and executes as expected -- but the execution time is similar to that of the unparallelized version. Is there really no other way for me to achieve this? This is my final question on the topic (and in this thread), then I have to move on and finish the project... thanks in advance for your help.

  12. #12
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by Niels_M View Post
    Code:
    int propagate(double X, double Y, double Z, double vX, double vY, double vZ, double time)
    {
        X += vX*time;
        Y += vY*time;
    
        if(sqrt(X*X + Y*Y) < 0.01)
        {
            return 1;
        }
        return 0;
    }
    Some things don't parallelise well, or specifically adding some OpenMP stuff isn't often going to magically improve things. In many cases you still have to design things around parallelisation. That means thinking about things such as writing to the same cache line from multiple threads hurting performance etc.

    If you simply want to speed this up, then the first thing I would do is change the line with the sqrt in it. sqrt is hideously slow compared to well just about everything.
    Simply square both sides of the equation, giving rise to this, where the sqrt is gone:
    Code:
    if (X*X + Y*Y < 0.01 * 0.01)
    It's also pointless passing Z and vZ if you don't use those, though the compiler will likely optimise that out in a release build.

    You are timing release builds right?
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  13. #13
    Registered User
    Join Date
    Aug 2009
    Posts
    140
    Quote Originally Posted by iMalc View Post
    Some things don't parallelise well, or specifically adding some OpenMP stuff isn't often going to magically improve things. In many cases you still have to design things around parallelisation. That means thinking about things such as writing to the same cache line from multiple threads hurting performance etc.

    If you simply want to speed this up, then the first thing I would do is change the line with the sqrt in it. sqrt is hideously slow compared to well just about everything.
    Simply square both sides of the equation, giving rise to this, where the sqrt is gone:
    Code:
    if (X*X + Y*Y < 0.01 * 0.01)
    It's also pointless passing Z and vZ if you don't use those, though the compiler will likely optimise that out in a release build.
    Thanks for your contribution, I have made those changes. Even with those optimizations, my program will take a horribly long time to finish. I have the opportunity to run it on a machine with 12 cores, so I thought I could take advantage of that. Even reducing the runtime by a small amount is desirable for me. But is it really true that generating numbers with boost cannot be parallelized?



    Quote Originally Posted by iMalc View Post
    You are timing release builds right?
    Yes.

  14. #14
    Registered User
    Join Date
    Aug 2009
    Posts
    140
    Does anyone know if I would be able to manage this, if I switched over to Boost's threading tools?

  15. #15
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by Niels_M View Post
    Does anyone know if I would be able to manage this, if I switched over to Boost's threading tools?
    using boost will simply require you to do explicitly in code, what OMP is doing for you implicitly. you'll have to explicitly create and lock mutexes to access the vector, among other things.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Where is my flaw, regarding use of OpenMP critica
    By ass in forum C++ Programming
    Replies: 2
    Last Post: 12-03-2011, 05:19 AM
  2. OpenMP question
    By oacikgoz in forum C++ Programming
    Replies: 0
    Last Post: 01-28-2009, 12:52 PM
  3. Thoughts on OpenMP
    By abachler in forum C++ Programming
    Replies: 3
    Last Post: 02-04-2008, 04:25 PM
  4. Internals of OPENMP?
    By chiku123 in forum C++ Programming
    Replies: 1
    Last Post: 10-23-2007, 10:19 AM
  5. openmp
    By l2u in forum Tech Board
    Replies: 0
    Last Post: 04-12-2007, 05:40 PM