Thread: Parallel processing with fork()

  1. #1
    Registered User
    Join Date
    Aug 2008
    Posts
    5

    Parallel processing with fork()

    I have a math intensive application, and I'm trying to utilize my quad processor to geta speedup. The job is trivially parallelizable; I'm working through a huge 2D array and the job on each row is totally independent. Fork crunches on the job easily and I get 1/4th of the results sitting in each thread, but I'm stuck on how to combine the results into one thing. Can this be done easily? I'm not an expert programmer and some of the stuff I searched up on shared memory was pretty daunting. If I could somehow just extract a pointer from the children that points to their data or something?

    It looks like pthreads can do this type of thing, but as far as I can tell it requires functions, unlike fork. In my case this would involve passing a ridiculous number of variables in the function, so I'm hesitant to go this route.

    Thanks for any help.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Just write all the results to 4 separate files (assuming each process is working on 1/4 of the problem). If necessary, a 5th process can trivially combine the temp files into a final result.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Sep 2007
    Posts
    1,012
    Quote Originally Posted by Mastiff View Post
    It looks like pthreads can do this type of thing, but as far as I can tell it requires functions, unlike fork. In my case this would involve passing a ridiculous number of variables in the function, so I'm hesitant to go this route.
    Yes, when you create a thread, a function is called. But the threads will share global variables, so it would not be strictly necessary to pass a lot of arguments to your thread function; just enough so it knows what part of the dataset it should be working on. Just don't forget to lock (pthread_mutex_lock(), etc) data that might be accessed concurrently.

    Of course, you might (reasonably) scoff at the use of globals. Each method has its downsides, I suppose.

  4. #4
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    fork() creates new processes, not new threads.
    Couldn't you pass a struct that holds all the data instead of individual variables? I think you'd have to anyways, since the thread function can only take a void* parameter.

  5. #5
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> I'm hesitant to go this route
    Fear of the unknown is understandable....so make it known

    https://computing.llnl.gov/tutorials/pthreads/
    http://www.cs.umu.se/kurser/TDBC64/V...ead-primer.pdf

    gg

  6. #6
    Registered User
    Join Date
    Aug 2008
    Posts
    5
    Quote Originally Posted by cpjust View Post
    fork() creates new processes, not new threads.
    Couldn't you pass a struct that holds all the data instead of individual variables? I think you'd have to anyways, since the thread function can only take a void* parameter.
    Yes, as I understand it, I'd have to get everything into a structure. So, how do people generally go about this, do they create such a structure and use it all throughout the program, inlcuding the non-threaded stuff? or do you copy everything to the structure, do the threaded part, and then copy it out back to the normal variables? Either way seems bad due to the sheer messiness of it, especially since the data includes several huge (4Kx8K) arrays.

    Thanks for the help.

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    What you do depends on what problem you are trying to solve.

    As long as each thread works on it's own data, it doesn't really matter how/where it was stored, and whether it is stored in a separate structure or one shared struct - just make sure that no two threads access the SAME region of data.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    Chinese pâté foxman's Avatar
    Join Date
    Jul 2007
    Location
    Canada
    Posts
    404
    You might also want to take a look at OpenMP. It might be what you were looking for (or it might not -- I don't have enough information to really tell). GCC has support for it since version 4.2, you just need to specify the "-fopenmp" flag and link with the appropriate library (libgomp).
    I hate real numbers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Parallel processing
    By ssharish2005 in forum Tech Board
    Replies: 3
    Last Post: 04-27-2009, 11:23 AM
  2. Help needed on parallel processing using fork() in C
    By smithu.simi in forum C Programming
    Replies: 7
    Last Post: 03-27-2009, 07:15 AM
  3. Parallel processing
    By DrSnuggles in forum C++ Programming
    Replies: 12
    Last Post: 11-16-2008, 04:05 PM
  4. fork(), exit() - few questions!
    By s3t3c in forum C Programming
    Replies: 10
    Last Post: 11-30-2004, 06:58 AM
  5. Replies: 2
    Last Post: 07-22-2004, 02:25 AM