But for that inner most loop, local_max is a reduction variable. You can still modify sum and pos arrays just like with a parallel for loop in the reduction function, so that part stays the same.

Unfortunately, looking at the docs, it looks like parallel_reduce doesn't play well with lambdas, so you have to do things the C++03 way.