It sounds like you might want threads. The idea would be to give each thread a different index into the vector's data (or possibly a C-style in array if a vector doesn't work) so that they won't get in each other's way. Then you don't need the mutex and you can get the benefit of multiple threads.
Code:
size_t size = 1000;
vector<int> v; // preallocate elements
v.reserve(size); // preallocate elements
size_t num_threads = 8;
size_t chunksize = size / num_threads;
for (size_t i=0; i<num_threads; ++i) {
size_t i_end = (i+1)*chunksize > v.size() ? v.size() : (i+1)*chunksize;
threads.push_back(thread(func, v, i*chunksize, i_end));
}
Then the threaded function would only access the vector within a particular range.
Code:
void func(vector<int>& v, size_t i_begin, size_t i_end) {
for (size_t i = i_begin; i < i_end; ++i)
v[i] = 1;
}