First, thanks for all the replies. There are many good points, so I will answer them independently.
Originally Posted by
phantomotap
O_o
Instead of telling us how you'd like to do something ("I want to use array because of threads." is a "how".), tell us what you are trying to accomplish.
You are right, let me tell more about my goal. Here is the most reduced example of my program. It simply generates some initial velocities in (x, y, z), and then propagates them in-deterministically (although it is deterministic in the example, for clarity..). So the majority of the processing is spent appending data to the vector. But I reasoned that I would still be able to gain a reduction in computation time, if I split up the 1011 cars across multiple threads, but at the same time avoiding critical regions/locks.
Code:
#include <iostream>
#include "omp.h"
using namespace std;
void propagate_car_1(double& vX, double& vY, double& vZ)
{
double bias = 5.0;
vX += bias;
vX += bias;
vZ += bias;
}
void propagate_car_2(double& vX, double& vY, double& vZ)
{
double bias = 5.0;
vX += bias;
vX += bias;
vZ += bias;
}
int main()
{
vector<double> final1_vX, final1_vY, final1_vZ;
vector<double> final2_vX, final2_vY, final2_vZ;
for(unsigned long long i=0; i<100000000000; ++i) //1E11
{
vX = 0.0;
vY = 0.0;
vZ = 5.0;
propagate_car_1(vX, vY, vZ); //IN REALITY THESE `propagate'-functions will be more complex (and non-deterministic....)
//SO I DON'T KNOW THE EXACT SIZE OF THE VECTORS AT THE END
if(vX < 20.0 && vY < 20.0 && vZ < 20.0)
{
final1_vX.push_back(vX);
final1_vY.push_back(vY);
final1_vZ.push_back(vZ);
}
propagate_car_2(vX, vY, vZ);
if(vX < 20.0 && vY < 20.0 && vZ < 20.0)
{
final2_vX.push_back(vX);
final2_vY.push_back(vY);
final2_vZ.push_back(vZ);
}
}
return 0;
}
Originally Posted by
Elkvis
you can gain plenty by multithreading, if each thread doesn't spend all its time accessing shared data. the idea is usually to acquire data from a shared source, do some processing on it, and then store it to a shared destination. for the simple task you originally presented in that other thread, the overwhelming majority of the time of each thread is spent filling a vector, which is probably a task better suited to a single thread.
I agree with you on that. I thought accessing arrays using [] did not require locking, but it does apparently. I like the suggestions given below of letting each thread have its own container, and merging them at the end (when the multithreaded part is over).
Originally Posted by
Cat
Threading isn't always going to make things faster, and could in fact slow things down, especially if your processing is I/O heavy rather than CPU heavy (which is becoming more and more the norm as CPU speeds have grown considerably faster than I/O speeds).
I am IO-bound, but I hope the below approach will work.
Originally Posted by
iMalc
You're heading for the wrong solution. Just remove the need to access the array from multiple threads at all...
Give each thread a separate vector to fill in. No locking required for dealing with the vectors then.
After the last thread completes, perform an extra step to combine the results from all the per-thread vectors into one large vector. This may seem like the extra step is just extra overhead, but it's still likely to be faster than the alternative.
All provided you don't have issues regarding the quantity of RAM used for this, of course. Even if you do, I'd still recommend basing things on this approach, but doing it in a smarter way.
That is definitely the approach I will take. I don't suspect I will be bounded by the RAM-quantity, but even if that was the case, I will definitely take this route. Would it be sufficient to make 2D vectors, where the first dimension is the thread number and the second dimension the actual value to be inserted? I guess I can use push_back in this case?
Thanks for all the replies.