Finding indices efficiently

**KBriggs** · 05-20-2010

Finally figured out why it was skipping indices - when it got to the end of a loop, it left the index > n, which wasn't getting picked up by the conditions. This does the trick:

Code:

  for (n = 1; n < nlim; n++)
  {
    for (p = -n; p <= n; p+=(((q<=-n)||(q>=n)||(r<=-n)||(r>=n)) ? 1 : 2*n))
    {
      for (q = -n; q <= n; q+=(((p<=-n)||(p>=n)||(r<=-n)||(r>=n)) ? 1 : 2*n))
      {
	for (r = -n; r <= n; r+=(((p<=-n)||(p>=n)||(q<=-n)||(q>=n)) ? 1 : 2*n))
	{

Now, how can I modify this to consider edge effects? Would it be simplest just to add an if statement to see if the index generated is valid before I do any array access with it? Ie

Code:

  for (n = 1; n < nlim; n++)
  {
    for (p = -n; p <= n; p+=(((q<=-n)||(q>=n)||(r<=-n)||(r>=n)) ? 1 : 2*n))
    {
      for (q = -n; q <= n; q+=(((p<=-n)||(p>=n)||(r<=-n)||(r>=n)) ? 1 : 2*n))
      {
	for (r = -n; r <= n; r+=(((p<=-n)||(p>=n)||(q<=-n)||(q>=n)) ? 1 : 2*n))
	{
           if (indices are within bounds)
           { 
               do stuff...

Or is there a more elegent way to deal with this in the for loop condition?

@MK27

Yeah, I know it's gimpy if you like math but you can get that radius with addition and subtraction.

Could you elaborate?

**Sebastiani** · 05-20-2010

Originally Posted by KBriggs

Finally figured out why it was skipping indices - when it got to the end of a loop, it left the index > n, which wasn't getting picked up by the conditions. This does the trick:

Code:

  for (n = 1; n < nlim; n++)
  {
    for (p = -n; p <= n; p+=(((q<=-n)||(q>=n)||(r<=-n)||(r>=n)) ? 1 : 2*n))
    {
      for (q = -n; q <= n; q+=(((p<=-n)||(p>=n)||(r<=-n)||(r>=n)) ? 1 : 2*n))
      {
    for (r = -n; r <= n; r+=(((p<=-n)||(p>=n)||(q<=-n)||(q>=n)) ? 1 : 2*n))
    {

Now, how can I modify this to consider edge effects? Would it be simplest just to add an if statement to see if the index generated is valid before I do any array access with it? Ie

Code:

  for (n = 1; n < nlim; n++)
  {
    for (p = -n; p <= n; p+=(((q<=-n)||(q>=n)||(r<=-n)||(r>=n)) ? 1 : 2*n))
    {
      for (q = -n; q <= n; q+=(((p<=-n)||(p>=n)||(r<=-n)||(r>=n)) ? 1 : 2*n))
      {
    for (r = -n; r <= n; r+=(((p<=-n)||(p>=n)||(q<=-n)||(q>=n)) ? 1 : 2*n))
    {
           if (indices are within bounds)
           { 
               do stuff...

Or is there a more elegent way to deal with this in the for loop condition?

I suppose you could put the bounds-checking logic into a function, eg:

Code:

int int_is_in_range( int value, int low, int high )
{
    return value >= low && value <= high;
}

**KBriggs** · 05-20-2010

Hm, it needs to be a little more specific than that I think, because remember that I am "located" at a certain cell i,j,k in the array. So when I do relative offsets from that position I need to make sure that the "absolute" position of the offset is inside the boundaries of the space.

basically checking that:

Code:

0<=i+p<xmax
0<=j+q<ymax
0<=k+r<zmax

I wonder if all the added checking involved in this method is going to result in me saving any time over the brute force algorithm? ^_^

So, boundaries taken care of with:

Code:

int isValidIndex(int i, int j, int k, int p, int q, int r, int xmax, int ymax, int zmax)
{
  return (0<=i+p)&&(xmax>i+p)&&(0<=j+q)&&(ymax>j+q)&&(0<=k+r)&&(zmax>k+r);
}

and

Code:

for (n = 1; n < nlim; n++)
  {
    for (p = -n; p <= n; p+=(((q<=-n)||(q>=n)||(r<=-n)||(r>=n)) ? 1 : 2*n))
    {
      for (q = -n; q <= n; q+=(((p<=-n)||(p>=n)||(r<=-n)||(r>=n)) ? 1 : 2*n))
      {
	for (r = -n; r <= n; r+=(((p<=-n)||(p>=n)||(q<=-n)||(q>=n)) ? 1 : 2*n))
	{
	  if (isValidIndex(i,j,k,p,q,r,xmax,ymax,zmax))

Thanks all for the help, I'm sure I'll have more questions later but that sorts things out for today

**Adak** · 05-20-2010

I was thinking that instead of processing thousands and thousands of array elements with zero's in them, to find the one's, maybe you could build a skiplist of just the squares with one in them. Then with the row and column index to each array element > 0 in the skiplist, find the zero adjacent squares that you needed.

Each skiplist node would be a struct with two int's: row and col. Wikipedia has a nice write up on skiplists and at the bottom of the page, there is also a nice animation showing how they work.

Brilliant stuff, if your program can use it.

**KBriggs** · 05-20-2010

Well I'll get things done like this and see if it is still too slow. I don't think I'll need any more optimization than the working outward from points of interest. Before, I had to loop through the entire array, and then for each element, loop through the whole thing again. Which scales like n^6. So working outward from interest points instead of scanning the whole array brings it closer to n^3, which should be manageable for the size of the input I am working with.

On the other hand, I really like the idea of learning a new trick ^_^ So maybe I'll try it anyway if I have time after I get the rest of the algorithm working.

Incidentally Salem, if the data set gets -really- big, what are the chances that the array will not be able to fit in one big block of memory? As an example, I expect the average array here to have about 150 million ints in it, mostly 0.

**Adak** · 05-21-2010

I can't say it's especially fast, but it does work.

Code:

/* Search the 2 Dimension array, outward from any square. 

Adak, May 20th, 2010.

status: ok in minimal testing
*/

#include <stdio.h>
#define RNUM 9
#define CNUM 9

void foundOne(int r, int c);
void showIt(int num[RNUM][CNUM]);
void search(int num[RNUM][CNUM], int r, int c, int rad); 

int main() {
  int i, j, r, c, rad; 
  int num[RNUM][CNUM] = { 
  {0, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 0, 0, 0, 0, 0, 0, 0}
  };
  printf("\n\n");

  for(r= 0;r< RNUM; r++) {
    for(c= 0;c< CNUM; c++) {
      for(rad=1; rad<RNUM; rad++) {
        num[r][c] = 9;
        search(num, r, c, rad);
        showIt(num);
        getch();
        for(i=0;i<RNUM;i++) {
          for(j=0;j<CNUM;j++)
            num[i][j] = 0;
        }
      } 
    }
  }
  printf("\n\n\t\t\t     press enter when ready");

  i = getchar(); ++i;
  return 0;
}
/* foundOne() is not used, but would show where a value 
   you were searching for, was found.
*/
void foundOne(int r, int c) {
  printf("\nfound a digit at num[%d][%d]", r, c);
}
/* search() is the heart of the program. It "walks" around 
   any square in the array, in increasing radii.
*/
void search(int num[RNUM][CNUM], int r, int c, int rad) {
  int r1, c1;

  for(r1=r-rad, c1=c-rad; c1 <= c+rad; c1++) { //top row 
    if(r1 < 0 || c1 < 0 || r1 >= RNUM || c1 >= CNUM) 
      continue;
    num[r1][c1] = rad;
    //if(num[r1][c1]) foundOne(r1,c1);
  }
  for(++r1, --c1; r1 <= r+rad; r1++) { //right side, going down 
    if(r1 < 0 || c1 < 0 || r1 >= RNUM || c1 >= CNUM) 
      continue;
    num[r1][c1] = rad;
    //if(num[r1][c1]) foundOne(r1,c1);
  }
  for(--c1, --r1; c1 >= c-rad; c1--) { //bottom row, going left 
    if(r1 < 0 || c1 < 0 || r1 >= RNUM || c1 >= CNUM) 
       continue;
    num[r1][c1] = rad;
    //if(num[r1][c1]) foundOne(r1,c1);
  }
  for(--r1, ++c1; r1 > r-rad; r1--) { //left side, going up, and minus 1 sqr
    if(r1 < 0 || c1 < 0 || r1 >= RNUM || c1 >= CNUM) 
       continue;
    num[r1][c1] = rad;
    //if(num[r1][c1]) foundOne(r1,c1);
  }  
}
/* shows what squares have been visited by the latest search */
void showIt(int num[RNUM][CNUM]) {
  int r, c;
  printf("\n\n");
  for(r=0;r<RNUM; r++) {
    for(c=0;c<CNUM; c++) {
      printf("%d ", num[r][c]);
    }
    printf("\n");
  }
}

**KBriggs** · 05-21-2010

Heh, cool, but I already have that part of the code working.

I was thinking about the skiplist idea - how is the skiplist any more efficient than going through the array, since I need to go through the array in order to build the skiplist?

MK27 said earlier that I could find distances using only addition and sutraction, but I don't see how. Can someone explain that?

**tabstop** · 05-21-2010

Originally Posted by KBriggs

Heh, cool, but I already have that part of the code working.

I was thinking about the skiplist idea - how is the skiplist any more efficient than going through the array, since I need to go through the array in order to build the skiplist?

MK27 said earlier that I could find distances using only addition and sutraction, but I don't see how. Can someone explain that?

Not quite serious answer: You use + and -.

More serious answer: If you're looking for "blocks that are one unit away from (i,j,k)", that means you need max(abs(x-i), abs(y-j), abs(z-k)) == 1. All your blocks in the ring that are two units away have the max equal to 2. If you were looking only in the cardinal directions, the "taxicab" distance from block (x,y,z) to (i,j,k) would just be abs(x-i)+abs(y-j)+abs(z-k).

**WDT** · 05-21-2010

Originally Posted by KBriggs

Hey all,

I have a three dimensional int array of 1s and 0s. I need a reasonably efficient way to, given an index i,j,k of a cell containing 0, find the index of the nearest cell that contains a 1. Here I define nearest as euclidian distance, ie sqrt((i-x)^2 + (j-y)^2 +(k-z)^2)). It is quite a simple matter to brute force it, by nesting three loops and testing each cell that contains a one for proximity to the cell of interest and then taking the minimal one, but give that this array may have several 10s of millions of cells this may take longer than I would like.

So: any suggestions on making this a little more efficient? Ideally I would like to find a way to only search for ones within a certain radius of a given cell, but I am stuck with the same problem of determining which cells are in that radius without looping through the whole array.

This is an ideal task for parallel programming.
If it falls within your task, you could consider partitioning the space around the location into an arbitrary number and performing parallel searches.

**KBriggs** · 05-21-2010

More serious answer: If you're looking for "blocks that are one unit away from (i,j,k)", that means you need max(abs(x-i), abs(y-j), abs(z-k)) == 1. All your blocks in the ring that are two units away have the max equal to 2. If you were looking only in the cardinal directions, the "taxicab" distance from block (x,y,z) to (i,j,k) would just be abs(x-i)+abs(y-j)+abs(z-k)

Ah. I need to define spherical shapes around the location of the ones, so I do need the euclidian distance.

This is an ideal task for parallel programming.
If it falls within your task, you could consider partitioning the space around the location into an arbitrary number and performing parallel searches.

I thought about that, but really that is more optimization than this program needs at the moment.

**Adak** · 05-21-2010

Originally Posted by KBriggs

Heh, cool, but I already have that part of the code working.

I was trying to finish this program up yesterday, but it wouldn't let me.

I was thinking about the skiplist idea - how is the skiplist any more efficient than going through the array, since I need to go through the array in order to build the skiplist?

When the data is read into the array, the program must know at that moment, which elements have the 1's. So that's when a skiplist would be built. I was also thinking that you could build a small auxiliary array of structs, and basically do the same thing - keeping the row and column and z value, for all the one's. Then work from those, instead of working from the zero's back to the one's.

If, OTOH, you are just being given the data, with the array already built, then a skiplist or auxiliary array, will do you no good, unless, you have to go through the same data, doing other searches.

MK27 said earlier that I could find distances using only addition and sutraction, but I don't see how. Can someone explain that?

I was ROFL with the + and - answer.

<< Thanks for the laugh, Tabstop! >>

Have you roughly timed this search to see if it will meet your needs or not, as is?

Have you considered Taxicab distance, rather than Euclidian? Seems appropriate given the arrays coordinates:

http://en.wikipedia.org/wiki/Manhattan_distance

**KBriggs** · 05-21-2010

I can't use taxicab distance because the ones represent the centres of features in a material, which will decay outward from there in a gaussian fashion, and for spherical symmetry I need the euclidian distance. Of course it won't be really spherical since it is all cubes, but the cubes should be small enough that it makes no difference.

And no, I haven't done the timing yet - still lots of work to be done on other parts of the program before I start doing anything overly fancy. I just wanted to avoid having 6 nested loops over the entire array ^_^

When the data is read into the array, the program must know at that moment, which elements have the 1's. So that's when a skiplist would be built. I was also thinking that you could build a small auxiliary array of structs, and basically do the same thing - keeping the row and column and z value, for all the one's. Then work from those, instead of working from the zero's back to the one's.

Ah. Gotcha. And yes, I realized that working out from the ones is more useful than finding ones from the zeroes.

**tabstop** · 05-21-2010

Originally Posted by KBriggs

I can't use taxicab distance because the ones represent the centres of features in a material, which will decay outward from there in a gaussian fashion, and for spherical symmetry I need the euclidian distance. Of course it won't be really spherical since it is all cubes, but the cubes should be small enough that it makes no difference.

Only you know if your data works for that. If the cubes ever get out to 3's, then that's no longer true (the farthest away in the 2-cube is sqrt(8), but the farthest away in the three cube is sqrt(27)>5 units away).

**KBriggs** · 05-21-2010

Oh it will likely go to the 100s or more, that's not the point. I can simply not modify any cells that are more than whatever radius I want - but that does require the euclidian distance. Not all of the cubes in the outermost shells will be modified, just the ones roughly in line with the central one for large n, and all of the ones in the shells for small n.

... Actually the taxicab distance might be a resaonable approximation for the cone starting on the central shell and radiating outward in each of the cardinal directions. I'll have to try it and see when I get there.

I am getting way ahead of myself here :P

**KBriggs** · 05-21-2010

Hey all,

just a little update - the program is just about out of testing at the moment and it is blazingly fast compared to what I expected - it can process a 200 million unit array in < 2 seconds. Which is awesome.

Thanks all

Thread: Finding indices efficiently

Thread Tools

Search Thread

Display

Similar Threads

tools for finding memory leaks

Vertices and Indices

Finding primes

MFC :: Finding Child Window of a CWnd* Object?

What does 'indices' mean?