Thread: How to use arrays when vectors are not applicable

  1. #1
    Registered User
    Join Date
    Aug 2009
    Posts
    140

    How to use arrays when vectors are not applicable

    Hi

    I have a loop, which produces somewhere between 1 and 5 million numbers that I need to keep track of. Normally I would store them all in a std::vector, but since my program is multithreaded, I will not be able to utilize std::vector (because if I did, I would not gain anything performance-wise..).

    My plan so far is just to use an array as in
    Code:
    double arr[5000000]
    and insert values as the loop progresses. However, naturally I would have to choose it slightly larger than the number of produced numbers - this means that the last part of the array will be empty. This brings about two questions:

    1) Would you guys just neglegt the last empty part of the array, or would you somehow resize it, in order to free the memory?

    2) Given that the array has served its purpose and that there is still considerable calculations to be done in the program, would you clear/empty the array in order to free up the memory?

    I'm very interested in hearing what your approach would be.

  2. #2
    Registered User
    Join Date
    May 2010
    Posts
    4,632
    Why do you think vectors are not applicable? If you pre-size your vector so you don't have to reallocate constantly you should probably be okay.

    @Niels_M Why did you delete your posts?

    Jim
    Last edited by jimblumberg; 04-10-2013 at 03:11 PM.

  3. #3
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    if the array is shared between the threads, you still need to control access to it via locking. in this regard, arrays are no different from vectors.

  4. #4
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    the rule (there may be exceptions, but it's better to assume there are not) is this:

    access to any data shared between threads must be serialized by a lock.

  5. #5
    Registered User
    Join Date
    Aug 2009
    Posts
    140
    Quote Originally Posted by Elkvis View Post
    the rule (there may be exceptions, but it's better to assume there are not) is this:

    access to any data shared between threads must be serialized by a lock.
    I don't understand this. So people like me that are interested in saving their data in between calculations simply wont gain anything from multithreading? Is there really no other container that I can use? This is really so depressing...

  6. #6
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    So people like me that are interested in saving their data in between calculations simply wont gain anything from multithreading? Is there really no other container that I can use?
    O_o

    It means that using multiple threads of execution is not an instant performance gain.

    It meas that naively applying simple patterns to applications of multiple threads will not do anything for you.

    It means a lot of things, but no that.

    Instead of telling us how you'd like to do something ("I want to use array because of threads." is a "how".), tell us what you are trying to accomplish.

    Certainly, you could just flail around a bit, and you may eventually manage something great, but just asking for advice is probably going to do a lot more for you.

    For example, if you've split your calculations across threads you probably don't need intermediate results shared across threads. You can simply have one `std::vector' per threads and merge the results later.

    Soma

  7. #7
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    you can gain plenty by multithreading, if each thread doesn't spend all its time accessing shared data. the idea is usually to acquire data from a shared source, do some processing on it, and then store it to a shared destination. for the simple task you originally presented in that other thread, the overwhelming majority of the time of each thread is spent filling a vector, which is probably a task better suited to a single thread.

  8. #8
    Registered User
    Join Date
    May 2003
    Posts
    1,619
    1. The fact you don't know the exact size of the final product worries me - assuming each vector/array element could be calculated independently of all others, you could probably make threads for doing the first million, the second million, etc. numbers, where each thread has its own non-overlapping series of indices. That really makes it behave like allocating n independent arrays that just happen to be sequential in memory. As long as no thread touches the others' memory, it could work.

    If you were thinking, however, that the thread would read the current number of elements in the array, store its result as the next element, and then increment the count of elements, chances are you've made things considerably slower than doing it single-threaded. Not only do you need to lock/unlock access to the shared resources (the counter), you're probably going to take a big performance hit on cache coherency.

    2. I'd always free memory I wasn't going to use again, whether it was small or large.

    3. Threading isn't always going to make things faster, and could in fact slow things down, especially if your processing is I/O heavy rather than CPU heavy (which is becoming more and more the norm as CPU speeds have grown considerably faster than I/O speeds).
    Last edited by Cat; 04-10-2013 at 06:06 PM.
    You ever try a pink golf ball, Wally? Why, the wind shear on a pink ball alone can take the head clean off a 90 pound midget at 300 yards.

  9. #9
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    You're heading for the wrong solution. Just remove the need to access the array from multiple threads at all...

    Give each thread a separate vector to fill in. No locking required for dealing with the vectors then.
    After the last thread completes, perform an extra step to combine the results from all the per-thread vectors into one large vector. This may seem like the extra step is just extra overhead, but it's still likely to be faster than the alternative.

    All provided you don't have issues regarding the quantity of RAM used for this, of course. Even if you do, I'd still recommend basing things on this approach, but doing it in a smarter way.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  10. #10
    Registered User
    Join Date
    Aug 2009
    Posts
    140
    First, thanks for all the replies. There are many good points, so I will answer them independently.

    Quote Originally Posted by phantomotap View Post
    O_o
    Instead of telling us how you'd like to do something ("I want to use array because of threads." is a "how".), tell us what you are trying to accomplish.
    You are right, let me tell more about my goal. Here is the most reduced example of my program. It simply generates some initial velocities in (x, y, z), and then propagates them in-deterministically (although it is deterministic in the example, for clarity..). So the majority of the processing is spent appending data to the vector. But I reasoned that I would still be able to gain a reduction in computation time, if I split up the 1011 cars across multiple threads, but at the same time avoiding critical regions/locks.

    Code:
    #include <iostream>
    #include "omp.h"
    
    using namespace std;
    
    
    void propagate_car_1(double& vX, double& vY, double& vZ)
    {
        double bias = 5.0;
        vX += bias;
        vX += bias;
        vZ += bias;
    }
    
    
    void propagate_car_2(double& vX, double& vY, double& vZ)
    {
        double bias = 5.0;
        vX += bias;
        vX += bias;
        vZ += bias;
    }
    
    
    int main()
    {
        vector<double> final1_vX, final1_vY, final1_vZ;
        vector<double> final2_vX, final2_vY, final2_vZ;
        for(unsigned long long i=0; i<100000000000; ++i) //1E11
        {
            vX = 0.0;
            vY = 0.0;
            vZ = 5.0;
    
            propagate_car_1(vX, vY, vZ); //IN REALITY THESE `propagate'-functions will be more complex (and non-deterministic....)
                                         //SO I DON'T KNOW THE EXACT SIZE OF THE VECTORS AT THE END
            if(vX < 20.0 && vY < 20.0 && vZ < 20.0)
            {
                final1_vX.push_back(vX);
                final1_vY.push_back(vY);
                final1_vZ.push_back(vZ);
            }
    
            propagate_car_2(vX, vY, vZ);
            if(vX < 20.0 && vY < 20.0 && vZ < 20.0)
            {
                final2_vX.push_back(vX);
                final2_vY.push_back(vY);
                final2_vZ.push_back(vZ);
            }
        }
        return 0;
    }

    Quote Originally Posted by Elkvis View Post
    you can gain plenty by multithreading, if each thread doesn't spend all its time accessing shared data. the idea is usually to acquire data from a shared source, do some processing on it, and then store it to a shared destination. for the simple task you originally presented in that other thread, the overwhelming majority of the time of each thread is spent filling a vector, which is probably a task better suited to a single thread.
    I agree with you on that. I thought accessing arrays using [] did not require locking, but it does apparently. I like the suggestions given below of letting each thread have its own container, and merging them at the end (when the multithreaded part is over).


    Quote Originally Posted by Cat View Post
    Threading isn't always going to make things faster, and could in fact slow things down, especially if your processing is I/O heavy rather than CPU heavy (which is becoming more and more the norm as CPU speeds have grown considerably faster than I/O speeds).
    I am IO-bound, but I hope the below approach will work.



    Quote Originally Posted by iMalc View Post
    You're heading for the wrong solution. Just remove the need to access the array from multiple threads at all...

    Give each thread a separate vector to fill in. No locking required for dealing with the vectors then.
    After the last thread completes, perform an extra step to combine the results from all the per-thread vectors into one large vector. This may seem like the extra step is just extra overhead, but it's still likely to be faster than the alternative.

    All provided you don't have issues regarding the quantity of RAM used for this, of course. Even if you do, I'd still recommend basing things on this approach, but doing it in a smarter way.
    That is definitely the approach I will take. I don't suspect I will be bounded by the RAM-quantity, but even if that was the case, I will definitely take this route. Would it be sufficient to make 2D vectors, where the first dimension is the thread number and the second dimension the actual value to be inserted? I guess I can use push_back in this case?

    Thanks for all the replies.

  11. #11
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by Niels_M View Post
    ...Would it be sufficient to make 2D vectors, where the first dimension is the thread number and the second dimension the actual value to be inserted? I guess I can use push_back in this case?
    What? Just use the vector as usual.
    You know, the only thing that differentiates a vector from an array is that it's dynamic, it can grow if necessary.
    You can reserve the necessary amount of items you think you may need and use push_back. That means the vector will not need to grow so much. But if it exceeds its current size, it will reallocate.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  12. #12
    Registered User
    Join Date
    Aug 2009
    Posts
    140
    Quote Originally Posted by Elysia View Post
    What? Just use the vector as usual.

    This is my suggestion, using OpenMP (the inner dimension 1 is arbitrarily chosen, I will make it larger as you suggest):
    Code:
        const int outer = omp_get_max_threads();
        vector<vector<double> > a(outer, vector<double>(1));
        
        #pragma omp parallel for
        for(int i=0; i<1000; ++i)
        {
            a[omp_get_thread_num()].pushback(i);        
        }
    Is this a correct way to assign a unique, dedicated vector to each thread, thus avoiding having to gain access via locks?

    Thanks.
    Last edited by Niels_M; 04-11-2013 at 04:07 AM.

  13. #13
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Now I see what you mean. No, don't do that.
    Declare a local vector to each thread. It should only have one dimension and accessed only by one thread.
    Your vector is currently accessed by all threads, which is bad.
    Make some parallel block or something.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  14. #14
    Registered User
    Join Date
    Aug 2009
    Posts
    140
    Quote Originally Posted by Elysia View Post
    Now I see what you mean. No, don't do that.
    Is it because it is bad practice, or because it simply doesn't work? Personally I can't see why it wouldn't work.

    That being said, I present my approach below, which I believe does exactly as you propose. The last part is critical, but OK - it is still faster than running everything on a single thread. To my knowledge the variables "local_vec" and "global_vec" can't be made global with this approach (as in, being placed in a separate .cpp-file with the extern keyword). Isn't that a serious flaw of this method, i.e. that I'm not allowed to decide for myself? That is one of the reasons why I liked my suggestion above (if it works).

    Code:
        vector<int> global_vec;
        #pragma omp parallel
        {
            vector<int> local_vec; //private to each thread.. pre-allocate the size will also be good
    
            #pragma omp for
            for(int i=0; i<1000; ++i)
            {
                local_vec.push_back(i);
            }
            #pragma omp barrier
    
            #pragma omp critical
            {
                for(int i=0; i<local_vec.size(); ++i)
                {
                    global_vec.push_back(local_vec(i));
                }
            }
        }
    Last edited by Niels_M; 04-11-2013 at 05:18 AM.

  15. #15
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by Niels_M View Post
    Is it because it is bad practice, or because it simply doesn't work? Personally I can't see why it wouldn't work.
    In general it is bad practice because it doesn't work. You are mistakenly assuming that there is some property of arrays or vectors that prevents data being accessed concurrent by multiple threads. Using a 2D array (which is just an array or arrays) offers no more protection than a 1D array does. Similarly for vectors.

    Quote Originally Posted by Niels_M View Post
    Isn't that a serious flaw of this method, i.e. that I'm not allowed to decide for myself?
    Your comment is hardly fair, since your original method would have shared the same "flaw" if you had correctly synchronised access to the global vector.

    There is nothing preventing global_vec from being declared extern, and defined elsewhere. The fact you believe there is shows your lack of understanding of what you are trying to do.

    However, if any other functions (in different source files) attempt to modify global_vec, then they need to synchronise access (eg make use of a critical section or mutex before modifying global_vec). If my recollection of openMP is correct, that can be done using a named critical section.

    Quote Originally Posted by Niels_M View Post
    That is one of the reasons why I liked my suggestion above (if it works).
    Too bad your suggestion above does not work.

    Rather than the second loop, I'd simply do
    Code:
         global_vec.insert(global_vec.end(), local_vec.begin(), local_vec.end());
    That operation still needs to be protected within a critical section though. There is NO WAY to avoid the need for synchronisation (e.g. using a critical section) - unless, of course, you are willing to compromise program correctness to achieve speed.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Vectors in Arrays
    By Osiris990 in forum C++ Programming
    Replies: 5
    Last Post: 02-25-2008, 02:48 PM
  2. Arrays vs Vectors
    By swgh in forum C++ Programming
    Replies: 5
    Last Post: 05-04-2006, 02:06 AM
  3. byte arrays & vectors
    By kasun in forum C++ Programming
    Replies: 1
    Last Post: 02-29-2004, 09:10 AM
  4. arrays or vectors
    By Geo-Fry in forum C++ Programming
    Replies: 26
    Last Post: 04-17-2003, 07:08 PM
  5. arrays and vectors
    By volk in forum C++ Programming
    Replies: 1
    Last Post: 03-30-2003, 03:45 PM