Malloc and MPI

**moddinati** · 08-02-2007

I have a couple of questions. First, when using malloc to dynamically create an array, the new array is still contiguous in memory correct?

Second, I'm trying to send a C structure in MPI in which one of the members is an array. If I specify the array size at compile time, everything works wonderful. The problem comes when I dynamically create the array, the information does not get passed. Here's an example of some simplified code I'm working on.

Code:

#include<stdio.h>
#include<stdlib.h>
#include "mpi.h"

//Structure I'm Trying to Send
struct ListNode
{
	double* a;
	double b;
	int n;
};
typedef struct ListNode Node;

//Function to build derived MPI datatype
void Build_derived_type(Node* myNode, MPI_Datatype* mesg_mpi_t_ptr);

int main(int argc, char* argv[])
{
	int nodeRank=0, numNodes,i;
	MPI_Datatype MPI_NODE; //New Datatype to send structure
	MPI_Status status;
	Node myNode;
	
	MPI_Init(&argc,&argv); //Initializes MPI
	MPI_Comm_size(MPI_COMM_WORLD,&numNodes); //find out how many nodes there are total
	MPI_Comm_rank(MPI_COMM_WORLD,&nodeRank); //Finds out rank of this node

	if(nodeRank == 0)
	{
		myNode.a = malloc(sizeof(double)*5); //Dynamically create array
		//Fill array
		myNode.a[0] = 1.1;
		myNode.a[1] = 2.2;
		myNode.a[2] = 3.3;
		myNode.a[3] = 4.4;
		myNode.a[4] = 5.5;
		
		myNode.b = 4.5;
		myNode.n = 6;
	
	}
	
	
	Build_derived_type(&myNode, &MPI_NODE); //Builds derived type
	
	//Send structure from node 0 to node 5
	if(nodeRank == 0)
	{
		MPI_Send(&myNode,1,MPI_NODE,5,1,MPI_COMM_WORLD);
	}
	//Node 5 recieves structure and displays data
	if(nodeRank == 5)
	{
		MPI_Recv(&myNode,1,MPI_NODE,0,1,MPI_COMM_WORLD,&status);
		printf("Noderank = %d\n",nodeRank);
		for(i=0;i<5;i++)
			printf("a[%d] = %f\n",i,myNode.a[i]);
		printf("b = %f\nn = %d\n",myNode.b,myNode.n);
	}

	
	MPI_Finalize();
	
	return 0;
}

void Build_derived_type(Node* myNode, MPI_Datatype* MPI_NODE_ptr)
{
	int block_lengths[3];
	MPI_Aint displacements[3];
	
	MPI_Datatype typelists[3];
	MPI_Aint start_address;
	MPI_Aint address;
	//fill block lengths
	block_lengths[0] = 5;
	block_lengths[1] = block_lengths[2] = 1;
	//Fill Typelists
	typelists[0] = MPI_DOUBLE;
	typelists[1] = MPI_DOUBLE;
	typelists[2] = MPI_INT;
	
	//Calculate Displacements
	displacements[0] = 0;
	MPI_Get_address(&((*myNode).a),&start_address);
	MPI_Get_address(&((*myNode).b),&address);
	displacements[1] = address - start_address;
	MPI_Address(&((*myNode).n),&address);
	displacements[2] = address - start_address;
	//Create and Commit new mpi type
	MPI_Type_create_struct(3,block_lengths,displacements,typelists,MPI_NODE_ptr);
	MPI_Type_commit(MPI_NODE_ptr);
}

The program still runs with no errors, but Node 5 never gets the data from the array, but it gets all the other members of the structure just fine.

any help appreciated

**matsp** · 08-02-2007

Yes, the data allocated by malloc is contiguous from a software standpoint (but the backing pages that you get may be ANY memory in (usually) 4K blocks).

But the problem here isn't that: The problem here is that you are passing a data-structure with a pointer inside it - the pointer is passed, but not the data pointed to by the pointer, so whilst a is the same value as on node 0, the memory pointed to by a is something completely random (including the possibility that this location isn't valid on node 5 at all).

There are several solutions, the most trivial one is to use a "flexible length struct", such as this:

Code:

struct ListNode
{
	double b;
	int n;
	double a[1];
};

.... 

struct ListNode *myNode;

myNode = malloc(sizeof(myNode) + sizeof(double) * (5-1));    
// -1 is because we already have a space for a[0], so we need only allocate the following ones

// Now just use "myNode->" instead of "myNode."
... 
// Send the pointer, not the address of the pointer, so remove &. 
MPISend(myNode, ...);
...

--
Mats

**moddinati** · 08-02-2007

Thanks for the quick reply. I knew it had to be something with the memory, I just was unable to pin it down. It makes a lot more sense now. With that being said, I tried implementing what you suggested, but the program is aborting when it hits the recv. It aborts even when I don't dynamically allocate the memory. I've looked over my code, but I'm not sure where I'm going wrong. If you could look over my code again it would be a great help. I'm sure it is just something simple that I missed.

thanks

Code:

#include<stdio.h>
#include<stdlib.h>
#include "mpi.h"

//Structure I'm Trying to Send
struct ListNode
{
	double a[1];
	double b;
	int n;
};
typedef struct ListNode Node;

//Function to build derived MPI datatype
void Build_derived_type(Node myNode, MPI_Datatype* mesg_mpi_t_ptr);

int main(int argc, char* argv[])
{
	int nodeRank, numNodes,i;
	MPI_Datatype MPI_NODE; //New Datatype to send structure
	MPI_Status status;
	Node *myNode;
	
	MPI_Init(&argc,&argv); //Initializes MPI
	MPI_Comm_size(MPI_COMM_WORLD,&numNodes); //find out how many nodes there are total
	MPI_Comm_rank(MPI_COMM_WORLD,&nodeRank); //Finds out rank of this node
        myNode = malloc(sizeof(myNode) + sizeof(double)*(5-1)); //Dynamically create array
	if(nodeRank == 0)
	{
		
		//Fill array
		myNode->a[0] = 1.1;
		myNode->a[1] = 2.2;
		myNode->a[2] = 3.3;
		myNode->a[3] = 4.4;
		myNode->a[4] = 5.5;
		
		myNode->b = 4.5;
		myNode->n = 6;
	}
	
	
	Build_derived_type(*myNode, &MPI_NODE); //Builds derived type

	//Send structure from node 0 to node 5
	if(nodeRank == 0)
	{
		MPI_Send(myNode,1,MPI_NODE,5,1,MPI_COMM_WORLD);
	}
	//Node 5 recieves structure and displays data
	if(nodeRank == 5)
	{
		MPI_Recv(myNode,1,MPI_NODE,0,1,MPI_COMM_WORLD,&status);		
		printf("Noderank = &#37;d\n",nodeRank);
		for(i=0;i<5;i++)
			printf("a[%d] = %f\n",i,myNode->a[i]);
		printf("b = %f\nn = %d\n",myNode->b,myNode->n);
	}
	
	MPI_Finalize();
	
	return 0;
}

void Build_derived_type(Node myNode, MPI_Datatype* MPI_NODE_ptr)
{
	int block_lengths[3];
	MPI_Aint displacements[3];
	
	MPI_Datatype typelists[3];
	MPI_Aint start_address;
	MPI_Aint address;
	//fill block lengths
	block_lengths[0] = 5;
	block_lengths[1] = block_lengths[2] = 1;
	//Fill Typelists
	typelists[0] = MPI_DOUBLE;
	typelists[1] = MPI_DOUBLE;
	typelists[2] = MPI_INT;
	
	//Calculate Displacements
	displacements[0] = 0;
	MPI_Get_address(&(myNode.a),&start_address);
	MPI_Get_address(&(myNode.b),&address);
	displacements[1] = address - start_address;
	MPI_Get_address(&(myNode.n),&address);
	displacements[2] = address - start_address;
	//Create and Commit new mpi type
	MPI_Type_create_struct(3,block_lengths,displacements,typelists,MPI_NODE_ptr);
	MPI_Type_commit(MPI_NODE_ptr);
}

Also, should the malloc statement be contain "sizeof(myNode)" or "sizeof(*myNode)"?

Thanks again

EDIT: moved malloc statement out of if statement. The code will now run but with interesting out put. Here is what I get.

Code:

Noderank = 5
a[0] = 1.1000000
a[1] = 4.5000000
a[2] = 3.2999999
a[3] = 4.4000000
a[5] = 5.5000000
b = 4.5000000
n = 6

**Salem** · 08-02-2007

There can be only one flexible array member in a struct, and it needs to be the last member of the struct.

**moddinati** · 08-02-2007

in my actual program, I will end up having three dynamically created arrays as members of my struct. Is there another way to work around this issue?

**matsp** · 08-02-2007

Yes, send each array separately (along with a "start-struct" perhaps).

Or, of course, you could have fixed allocation for the data structure passed, so you send (say) a set of 256 variables.

If all the data is the same size, you could do something like this:

Code:

struct  node_data {
...  // some data that is "always there".
  struct variable_size_data {
      double a;
      double b;
      double c;
  } datablock[1];
};

--
Mats

**moddinati** · 08-02-2007

So I've been doing some reading, and it seems I have two choices:
1) Either to send each array separately like already mentioned, although the memory would still be non-contiguous which might cause issues.

2) I noticed MPI has a Pack/unpack command that looks like it might work. From my reading it said this function requires additional overhead every single time it is called. In option 1, building the derived datatypes only require the additional overhead once at creation.

So here is my question, does anyone know which would be more efficient in the end? (I will be sending three dynamically created arrays and an integer).

**matsp** · 08-02-2007

Assuming your MPI transaction is over gigabit or slower ethernet comms, it's probably not important which method you use, as the slowest part will be getting the data to the other node.

I'm not entirely sure that MPI pack does what you want anyways.

--
Mats

**matsp** · 08-02-2007

Further to this - I have looked a bit more at pack/unpack, and I think you CAN do what you want with that, if you would like to.

There is an advantage with doing a single send, compared to multiple send operations, as you use less bandwidth on the network (for small packets of data - if it's large packets, then it's not much difference).

Are your three data arrays significantly different in size, or could you make something that works along the lines of my a, b and c struct in the earlier post by me? A small waste isn't going to make much of a difference, but if starts being a big bunch of "empty data" then it's probably better to use pack/unpack.

--
Mats

**moddinati** · 08-02-2007

Thanks for taking the time to follow up on it.

two of the arrays are always exactly the same size. The magnitude of these can vary a lot, but it will be anywhere in the range from 1 to 500ish, maybe a little more.

the other array also varies, but most of the time it will probably be anywhere from 2 to 10 or so.

As far as speeds of communicators, this code will be run on regular clusters but it will also be run on SMP's, so it will potentially have a faster communication time.

What I have been trying to do is for each array create two send commands, one for the index 0 of the array since it is stored in the structure and not contiguous with the rest of the array, and another send command to send the rest of the array(by starting at index 1 and using the ability to specify a larger count). I would then manually rebuild the structure on receiving side. My test cluster has been acting up today combined with the fact that my experience in MPI is limited I have not made much progress. I think I might look more into the pack/unpack since optimization is desirable.

**matsp** · 08-03-2007

Originally Posted by moddinati

Thanks for taking the time to follow up on it.

two of the arrays are always exactly the same size. The magnitude of these can vary a lot, but it will be anywhere in the range from 1 to 500ish, maybe a little more.

So make this a "variable size at the end of struct".

the other array also varies, but most of the time it will probably be anywhere from 2 to 10 or so.

If you can determine the max limit, then I'd make this a fixed size array of 10 or 12 or whatever you think is big enough for every use-case - loosing some 8-10 float sizes isn't much in the whole scheme of things]

So your total structure would look something like this:
e.g

Code:

struct something {
   ...
   int n_xy;
   int n_z;
   ...
   double z[16];
   struct _xy {
      double x;
      double y;
   } xy[1];
}

As far as speeds of communicators, this code will be run on regular clusters but it will also be run on SMP's, so it will potentially have a faster communication time.

What I have been trying to do is for each array create two send commands, one for the index 0 of the array since it is stored in the structure and not contiguous with the rest of the array, and another send command to send the rest of the array(by starting at index 1 and using the ability to specify a larger count). I would then manually rebuild the structure on receiving side. My test cluster has been acting up today combined with the fact that my experience in MPI is limited I have not made much progress. I think I might look more into the pack/unpack since optimization is desirable.

I would have thought that my suggestion above the most optimal given the conditions you describe - a few dozen bytes "waste" because the "z" array is too large, but not too much to make much of a difference. The "xy" array is the size it needs to be, so it's not wasting any space.

I added the fields "n_xy" and "n_z" to indicate how much data there is in each of the arrays.

--
Mats

**moddinati** · 08-03-2007

So make this a "variable size at the end of struct".

I have two different arrays of that size, so i don't think that would work.

If you can determine the max limit, then I'd make this a fixed size array of 10 or 12 or whatever you think is big enough for every use-case - loosing some 8-10 float sizes isn't much in the whole scheme of things]

I'm not sure if I'm going to be allowed to do that. The code itself is a clustering algorithm in which the magnitudes of data input can change dramatically, those were just rough estimates. Technically, those arrays could be pretty much any size.

I'm playing around with something right now, but I'm running into an issue. What is the difference between the following two sets of code, because the program likes the first set fine but not the second?

Code:

struct Node myNode;
myNode.array = malloc(sizeof(double)*5); //array is a double* inside the structure

and

Code:

struct Node* myNode;
myNode->array = malloc(sizeof(double)*5); //array is a double* inside the structure

Thanks, I'm not really a CS major so my understanding of the deeper levels of the code is limited.

**matsp** · 08-03-2007

My struct suggestion contains two elements that are variable size. As long as the third element has relatively few elements and it is possible to predict the maximum reasonable number, you should be able to use a "two element" variable size array as I've described above.

But obviously, you can solve the problem by doing multiple send operations - only problem is that the machine sending goes idle until the other machine has received each packet.

--
Mats

**mog** · 08-03-2007

Originally Posted by moddinati

..
I'm playing around with something right now, but I'm running into an issue. What is the difference between the following two sets of code, because the program likes the first set fine but not the second?

Code:

struct Node myNode;
myNode.array = malloc(sizeof(double)*5); //array is a double* inside the structure

and

Code:

struct Node* myNode;
myNode->array = malloc(sizeof(double)*5); //array is a double* inside the structure

You need to allocate space for the structure also.

Code:

struct Node *n = malloc(sizeof(struct Node));
n->array = malloc(5 * sizeof(double));

**moddinati** · 08-03-2007

Originally Posted by matsp

My struct suggestion contains two elements that are variable size. As long as the third element has relatively few elements and it is possible to predict the maximum reasonable number, you should be able to use a "two element" variable size array as I've described above.

--
Mats

I think i misunderstood you initially, thanks for clarifying.

Originally Posted by mog

You need to allocate space for the structure also.

Code:

struct Node *n = new malloc(sizeof(struct Node));
n->array = malloc(5 * sizeof(double));

Thanks for the reply. I realized the problem was that I was using the Node ptr without first pointing it to anything. oops.

Do you know if this is the best forum to post questions for MPI or is there one specifically for MPI somewhere? I appreciate the help, its hard to find resources for MPI.

Thread: Malloc and MPI

Thread Tools

Search Thread

Display

Malloc and MPI

Similar Threads

Sorting whit MPI

Alternative to malloc