How to use arrays when vectors are not applicable

**Niels_M** · 04-10-2013

Hi

I have a loop, which produces somewhere between 1 and 5 million numbers that I need to keep track of. Normally I would store them all in a std::vector, but since my program is multithreaded, I will not be able to utilize std::vector (because if I did, I would not gain anything performance-wise..).

My plan so far is just to use an array as in

Code:

double arr[5000000]

and insert values as the loop progresses. However, naturally I would have to choose it slightly larger than the number of produced numbers - this means that the last part of the array will be empty. This brings about two questions:

1) Would you guys just neglegt the last empty part of the array, or would you somehow resize it, in order to free the memory?

2) Given that the array has served its purpose and that there is still considerable calculations to be done in the program, would you clear/empty the array in order to free up the memory?

I'm very interested in hearing what your approach would be.

**jimblumberg** · 04-10-2013

Why do you think vectors are not applicable? If you pre-size your vector so you don't have to reallocate constantly you should probably be okay.

@Niels_M Why did you delete your posts?

Jim

**Elkvis** · 04-10-2013

if the array is shared between the threads, you still need to control access to it via locking. in this regard, arrays are no different from vectors.

**Elkvis** · 04-10-2013

the rule (there may be exceptions, but it's better to assume there are not) is this:

access to any data shared between threads must be serialized by a lock.

**Niels_M** · 04-10-2013

Originally Posted by Elkvis

the rule (there may be exceptions, but it's better to assume there are not) is this:

access to any data shared between threads must be serialized by a lock.

I don't understand this. So people like me that are interested in saving their data in between calculations simply wont gain anything from multithreading? Is there really no other container that I can use? This is really so depressing...

**phantomotap** · 04-10-2013

So people like me that are interested in saving their data in between calculations simply wont gain anything from multithreading? Is there really no other container that I can use?

O_o

It means that using multiple threads of execution is not an instant performance gain.

It meas that naively applying simple patterns to applications of multiple threads will not do anything for you.

It means a lot of things, but no that.

Instead of telling us how you'd like to do something ("I want to use array because of threads." is a "how".), tell us what you are trying to accomplish.

Certainly, you could just flail around a bit, and you may eventually manage something great, but just asking for advice is probably going to do a lot more for you.

For example, if you've split your calculations across threads you probably don't need intermediate results shared across threads. You can simply have one `std::vector' per threads and merge the results later.

Soma

**Elkvis** · 04-10-2013

you can gain plenty by multithreading, if each thread doesn't spend all its time accessing shared data. the idea is usually to acquire data from a shared source, do some processing on it, and then store it to a shared destination. for the simple task you originally presented in that other thread, the overwhelming majority of the time of each thread is spent filling a vector, which is probably a task better suited to a single thread.

**Cat** · 04-10-2013

1. The fact you don't know the exact size of the final product worries me - assuming each vector/array element could be calculated independently of all others, you could probably make threads for doing the first million, the second million, etc. numbers, where each thread has its own non-overlapping series of indices. That really makes it behave like allocating n independent arrays that just happen to be sequential in memory. As long as no thread touches the others' memory, it could work.

If you were thinking, however, that the thread would read the current number of elements in the array, store its result as the next element, and then increment the count of elements, chances are you've made things considerably slower than doing it single-threaded. Not only do you need to lock/unlock access to the shared resources (the counter), you're probably going to take a big performance hit on cache coherency.

2. I'd always free memory I wasn't going to use again, whether it was small or large.

3. Threading isn't always going to make things faster, and could in fact slow things down, especially if your processing is I/O heavy rather than CPU heavy (which is becoming more and more the norm as CPU speeds have grown considerably faster than I/O speeds).

**iMalc** · 04-11-2013

You're heading for the wrong solution. Just remove the need to access the array from multiple threads at all...

Give each thread a separate vector to fill in. No locking required for dealing with the vectors then.
After the last thread completes, perform an extra step to combine the results from all the per-thread vectors into one large vector. This may seem like the extra step is just extra overhead, but it's still likely to be faster than the alternative.

All provided you don't have issues regarding the quantity of RAM used for this, of course. Even if you do, I'd still recommend basing things on this approach, but doing it in a smarter way.

**Niels_M** · 04-11-2013

First, thanks for all the replies. There are many good points, so I will answer them independently.

Originally Posted by phantomotap

O_o
Instead of telling us how you'd like to do something ("I want to use array because of threads." is a "how".), tell us what you are trying to accomplish.

You are right, let me tell more about my goal. Here is the most reduced example of my program. It simply generates some initial velocities in (x, y, z), and then propagates them in-deterministically (although it is deterministic in the example, for clarity..). So the majority of the processing is spent appending data to the vector. But I reasoned that I would still be able to gain a reduction in computation time, if I split up the 10¹¹ cars across multiple threads, but at the same time avoiding critical regions/locks.

Code:

#include <iostream>
#include "omp.h"

using namespace std;


void propagate_car_1(double& vX, double& vY, double& vZ)
{
    double bias = 5.0;
    vX += bias;
    vX += bias;
    vZ += bias;
}


void propagate_car_2(double& vX, double& vY, double& vZ)
{
    double bias = 5.0;
    vX += bias;
    vX += bias;
    vZ += bias;
}


int main()
{
    vector<double> final1_vX, final1_vY, final1_vZ;
    vector<double> final2_vX, final2_vY, final2_vZ;
    for(unsigned long long i=0; i<100000000000; ++i) //1E11
    {
        vX = 0.0;
        vY = 0.0;
        vZ = 5.0;

        propagate_car_1(vX, vY, vZ); //IN REALITY THESE `propagate'-functions will be more complex (and non-deterministic....)
                                     //SO I DON'T KNOW THE EXACT SIZE OF THE VECTORS AT THE END
        if(vX < 20.0 && vY < 20.0 && vZ < 20.0)
        {
            final1_vX.push_back(vX);
            final1_vY.push_back(vY);
            final1_vZ.push_back(vZ);
        }

        propagate_car_2(vX, vY, vZ);
        if(vX < 20.0 && vY < 20.0 && vZ < 20.0)
        {
            final2_vX.push_back(vX);
            final2_vY.push_back(vY);
            final2_vZ.push_back(vZ);
        }
    }
    return 0;
}

Originally Posted by Elkvis

you can gain plenty by multithreading, if each thread doesn't spend all its time accessing shared data. the idea is usually to acquire data from a shared source, do some processing on it, and then store it to a shared destination. for the simple task you originally presented in that other thread, the overwhelming majority of the time of each thread is spent filling a vector, which is probably a task better suited to a single thread.

I agree with you on that. I thought accessing arrays using [] did not require locking, but it does apparently. I like the suggestions given below of letting each thread have its own container, and merging them at the end (when the multithreaded part is over).

Originally Posted by Cat

Threading isn't always going to make things faster, and could in fact slow things down, especially if your processing is I/O heavy rather than CPU heavy (which is becoming more and more the norm as CPU speeds have grown considerably faster than I/O speeds).

I am IO-bound, but I hope the below approach will work.

Originally Posted by iMalc

You're heading for the wrong solution. Just remove the need to access the array from multiple threads at all...

Give each thread a separate vector to fill in. No locking required for dealing with the vectors then.
After the last thread completes, perform an extra step to combine the results from all the per-thread vectors into one large vector. This may seem like the extra step is just extra overhead, but it's still likely to be faster than the alternative.

All provided you don't have issues regarding the quantity of RAM used for this, of course. Even if you do, I'd still recommend basing things on this approach, but doing it in a smarter way.

That is definitely the approach I will take. I don't suspect I will be bounded by the RAM-quantity, but even if that was the case, I will definitely take this route. Would it be sufficient to make 2D vectors, where the first dimension is the thread number and the second dimension the actual value to be inserted? I guess I can use push_back in this case?

Thanks for all the replies.

**Elysia** · 04-11-2013

Originally Posted by Niels_M

...Would it be sufficient to make 2D vectors, where the first dimension is the thread number and the second dimension the actual value to be inserted? I guess I can use push_back in this case?

What? Just use the vector as usual.
You know, the only thing that differentiates a vector from an array is that it's dynamic, it can grow if necessary.
You can reserve the necessary amount of items you think you may need and use push_back. That means the vector will not need to grow so much. But if it exceeds its current size, it will reallocate.

**Niels_M** · 04-11-2013

Originally Posted by Elysia

What? Just use the vector as usual.

This is my suggestion, using OpenMP (the inner dimension 1 is arbitrarily chosen, I will make it larger as you suggest):

Code:

    const int outer = omp_get_max_threads();
    vector<vector<double> > a(outer, vector<double>(1));
    
    #pragma omp parallel for
    for(int i=0; i<1000; ++i)
    {
        a[omp_get_thread_num()].pushback(i);        
    }

Is this a correct way to assign a unique, dedicated vector to each thread, thus avoiding having to gain access via locks?

Thanks.

**Elysia** · 04-11-2013

Now I see what you mean. No, don't do that.
Declare a local vector to each thread. It should only have one dimension and accessed only by one thread.
Your vector is currently accessed by all threads, which is bad.
Make some parallel block or something.

**Niels_M** · 04-11-2013

Originally Posted by Elysia

Now I see what you mean. No, don't do that.

Is it because it is bad practice, or because it simply doesn't work? Personally I can't see why it wouldn't work.

That being said, I present my approach below, which I believe does exactly as you propose. The last part is critical, but OK - it is still faster than running everything on a single thread. To my knowledge the variables "local_vec" and "global_vec" can't be made global with this approach (as in, being placed in a separate .cpp-file with the extern keyword). Isn't that a serious flaw of this method, i.e. that I'm not allowed to decide for myself? That is one of the reasons why I liked my suggestion above (if it works).

Code:

    vector<int> global_vec;
    #pragma omp parallel
    {
        vector<int> local_vec; //private to each thread.. pre-allocate the size will also be good

        #pragma omp for
        for(int i=0; i<1000; ++i)
        {
            local_vec.push_back(i);
        }
        #pragma omp barrier

        #pragma omp critical
        {
            for(int i=0; i<local_vec.size(); ++i)
            {
                global_vec.push_back(local_vec(i));
            }
        }
    }

**grumpy** · 04-11-2013

Originally Posted by Niels_M

Is it because it is bad practice, or because it simply doesn't work? Personally I can't see why it wouldn't work.

In general it is bad practice because it doesn't work. You are mistakenly assuming that there is some property of arrays or vectors that prevents data being accessed concurrent by multiple threads. Using a 2D array (which is just an array or arrays) offers no more protection than a 1D array does. Similarly for vectors.

Originally Posted by Niels_M

Isn't that a serious flaw of this method, i.e. that I'm not allowed to decide for myself?

Your comment is hardly fair, since your original method would have shared the same "flaw" if you had correctly synchronised access to the global vector.

There is nothing preventing global_vec from being declared extern, and defined elsewhere. The fact you believe there is shows your lack of understanding of what you are trying to do.

However, if any other functions (in different source files) attempt to modify global_vec, then they need to synchronise access (eg make use of a critical section or mutex before modifying global_vec). If my recollection of openMP is correct, that can be done using a named critical section.

Originally Posted by Niels_M

That is one of the reasons why I liked my suggestion above (if it works).

Too bad your suggestion above does not work.

Rather than the second loop, I'd simply do

Code:

     global_vec.insert(global_vec.end(), local_vec.begin(), local_vec.end());

That operation still needs to be protected within a critical section though. There is NO WAY to avoid the need for synchronisation (e.g. using a critical section) - unless, of course, you are willing to compromise program correctness to achieve speed.

Thread: How to use arrays when vectors are not applicable

Thread Tools

Search Thread

Display

How to use arrays when vectors are not applicable

Similar Threads

Vectors in Arrays

Arrays vs Vectors

byte arrays & vectors

arrays or vectors

arrays and vectors