I have a math intensive application, and I'm trying to utilize my quad processor to geta speedup. The job is trivially parallelizable; I'm working through a huge 2D array and the job on each row is totally independent. Fork crunches on the job easily and I get 1/4th of the results sitting in each thread, but I'm stuck on how to combine the results into one thing. Can this be done easily? I'm not an expert programmer and some of the stuff I searched up on shared memory was pretty daunting. If I could somehow just extract a pointer from the children that points to their data or something?
It looks like pthreads can do this type of thing, but as far as I can tell it requires functions, unlike fork. In my case this would involve passing a ridiculous number of variables in the function, so I'm hesitant to go this route.
Thanks for any help.