Thread: MPI, which way will be best?

  1. #1
    Registered User
    Join Date
    Jun 2009
    Posts
    13

    MPI, which way will be best?

    Hi,

    I'm writing an algorithm in MPI which basically splits up a massive vector amongst different nodes (i.e. node 1 has entries 0-1,000,000, node 2 has 1,000,000 - 2,000,000, etc.) and basically my algorithm requires that some calculation be done on each element of the array and the result gets added to another component of a duplicate array of the same size and split up in the same way which, most likely, won't be located on the same node (i.e. after computing on element 1000 on node 1 I find that I have to add the result to element 1,000,102 which is on node 2). So basically, if each node has n components of the vector on it it's going to have to send the result to n different places AND receive lots of potential changes to its own section of the vector. Now my question is, what is the BEST way to do this? I'm torn between:

    -after computing each element ISend the result (assuming it needs to be sent to another server) and then do a sweep of IReceives to see if there are any results from other servers that are trying to be sent to me.
    -the same thing using buffered send (IBSend). However, memory usage is a huge issue (I'm basically going to make my vector as big as possible) so I don't know how many buffers I need. Should I set up one local buffer for each server (i.e. buffer 2 on server 1 is big enough for a single message and is reserved for IBSend's to server 2). But then what if I need to send to another server but there's still an outgoing message waiting to be received on the other side. I'll have to block until it gets picked up
    -The final option i'm wondering about is MPI-2's windowing features and using the put command to simply place the result where it needs to be placed.

    Can someone who knows a fair amount about MPI performance considerations help me determine which implementation will be the most effective. Any help is greatly appreciated

  2. #2
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    I would do the window ("shared memory") approach. The performance should be similar to what you could achieve "by hand" - so if the performance isn't where it needs to be, then a different approach may be needed.

    gg

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > So basically, if each node has n components of the vector on it it's going to have to send the
    > result to n different places AND receive lots of potential changes to its own section of the vector
    I'd say that your approach is going to drown in communication and synchronisation overheads.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Registered User
    Join Date
    Jun 2009
    Posts
    13
    Well yes. I am going to need to do an obscene amount of communication (potentially terabytes worth) which is why picking the best implementation is a huge deal.

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well is multi-threading it even an option then?

    For example, if you can do 1M calculations for each message you send or receive, then perhaps it's worth it. Whereas 1K calculations might not be.

    Sequential might seem awful, but it will be a hell of a lot simpler to implement (and test and debug). An overly elaborate threaded approach might run for weeks before you discover some rare race condition which breaks everything.

    About the problem:
    For some a[x], which results in updating b[y] belonging to another thread, would you ever need to refer to b[y] again within the current thread?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User
    Join Date
    Jun 2009
    Posts
    13
    other process might add to b[y] but its value will never be need until all processes have completed. Basically what it is is that if I have a vector q and b which are both huge I want to do q= q+Ab where A is actually a sum of PIVOT matrices. So what I do is hard code the pivots so I never have to actually store the matrix A and basically I do the pivots on b by figuring out where each pivot is pivoting to and adding the corresponding element of b to q (so this why I never actually change b)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Communication using MPI
    By Cell in forum Linux Programming
    Replies: 9
    Last Post: 08-13-2009, 02:28 AM
  2. MPI in C
    By ltee in forum C Programming
    Replies: 5
    Last Post: 03-26-2009, 06:10 AM
  3. Sorting whit MPI
    By isato in forum C Programming
    Replies: 0
    Last Post: 03-03-2009, 10:38 AM
  4. Malloc and MPI
    By moddinati in forum C Programming
    Replies: 17
    Last Post: 03-07-2008, 07:55 PM
  5. MPI programming
    By kris.c in forum Tech Board
    Replies: 1
    Last Post: 12-08-2006, 12:25 PM