What is this lock for? Why won't you just split this one big set of numbers into smaller subsets and then let each thread sum one set, finally adding computed sums in the main thread?