Program design ideas

**KBriggs** · 06-11-2010

Hey all,

I am working with a numerical solver for differential equations. Basically I have a lump of matter, which I chop up into billions of tiny cubes and act the algorithm on each one. The values in each cell are update by the ones that surround it at each timestep. It is parallelized so that I can use any number of processors I like.

The problem is that in order to get the resolution I need, my program needs well over 1TB of RAM. Which is ridiculous. I would like to cut that in half, if possible.

Now, because most of the interesting stuff happens only in select areas of the material I am working with, the idea if to use a grid that is very fine where I need resolution and course where I don't. This is pretty well established practice.

The problem arises with the parallelization. Right now, each processor holds the same volume of space, which amounts to about 20 million grid cells per processor. But if my resolution changes from one place to the next, those 20 million grid cells will hold different volumes of space depending on where they are in the simulation. Which means that the processors will be limited to the same amount of space as the one with the finest resolution inside it, which in turn means that processors that are in the coarse resolution areas will be holding only maybe a million grid cells,and will not be working to capacity.

So I will reduce my memory requirements, but the number of processors I need to do it will skyrocket. Which is just as bad.

Now I had an idea for getting around this, but I have never worked with parallel programming before (this one the parallelization was done when I got it) so I want to run it by you guys before I try it so that I know I am not wasting my time.

I want to make the processors layout adaptive in the same way that the grid cells are - more processors where there is higher resolution grid cells, and vice versa. This means, however, that processors will have a variable number of neighbors to communicate with depending on where they are in the space (processors only need to pass info to their immediate neighbors once per timestep).

So: is this a common practice? Is it feasible? Anyone have simpler ideas?

I am having a hard time getting anything useful on google because my "vocabulary" in this field is pretty limited as of yet.

**bernt** · 06-11-2010

The problem arises with the parallelization. Right now, each processor holds the same volume of space, which amounts to about 20 million grid cells per processor. But if my resolution changes from one place to the next, those 20 million grid cells will hold different volumes of space depending on where they are in the simulation. Which means that the processors will be limited to the same amount of space as the one with the finest resolution inside it, which in turn means that processors that are in the coarse resolution areas will be holding only maybe a million grid cells,and will not be working to capacity.

Not necessarily - as I understand it the processors don't have to process the same amount of "physical" space, or even a contiguous area. I assume each cell has the same amount of information in it (that's why you're dividing the mass the way you are, right?), so you could just assign cells to processors. Wait for a processor to return from its task and then give it a new one right away, or dole out 20 million (regardless of size) to each processor.

**KBriggs** · 06-11-2010

Yes, the processors are each currently processing the same amount of physical space - which is why they are limited to the same amount that the highest resolution one holds. If the high resolution cells correspond to a volume of 20 arbitrary units for 20 million cells, then all the processors will have to hold 20 arbitrary units worth of cells, but in the case of the coarse grain, that might be only 1 million cells.

I would like to dole out 20 million to each processor regardless of size, that is the whole idea I am asking about. The issue is in setting up the communication between processors - the communication has to be done in terms of "real space" ie I need the edges of the blocks assigned to each processor to match up in a physically meaningful way, which become a lot more complex when processors are holding variable amounts of space.

The only thing identifying processors in an integer rank - when the parallelization is simple done by cutting things into cubes it is very simple to determine where processors are in relation to each other using % and / on the ranks. When the scheme is more complex, this is not so clearcut.

Is anyone familiar with how MPI assigns ranks to processors? Can ranks be assigned manually?

Thread: Program design ideas

Thread Tools

Search Thread

Display

Program design ideas

Similar Threads

g++ compiler errors with vector class after installing/uninstalling libstlport

Client-server system with input from separate program

parallel program ideas

Program ideas

Program ideas