finding nearest neighbour distance

**zesty** · 11-28-2009

and the number of points that I am dealing with will be of the order of N= 1000 say, so not very large,,

**iMalc** · 11-28-2009

A few people have posted code, most notably 'hilarius', so you must have gotten started now. Perhaps you'd like to post the code you have now, and let us know what part you're finding difficult.

**zesty** · 12-01-2009

the first bit of my code is as follows so far:

insert

Code:

struct point{
	double x;
	double y;
}; 
typedef struct point point;

double distance (point a, point b); 

double distancex (point a, point b);

double distancey (point a, point b);

int main(){
int i = 0, j = 0;
double entropyxy, entropyx, entropyy;
double dis, result;
double disx, resultx;
double disy, resulty;
double digammaone, digammaN;
double mutual_information;


point myPoints[3] = {{1.0, 1.0}, {2.0, 3.0}, {3.0, 4.0}};
	 
	 
	
	 
		for (i = 0; i < 3; i++){
		

		
		result = distance(myPoints[i], myPoints[i+1]);
		printf("first distance:%f\n", result);
		
		for (j = 0; j < 3; j++){
		
		
		
			if (i == j)
			 continue;
				
				
				
           dis = distance(myPoints[i], myPoints[j]);
            printf("%d %d\n", i, j);
            printf("%f\n", dis);
				
           if (result > dis)
           result = dis;
		 
			
          
	}
	
		   printf("shortest: %f\n", result);
	    
		   if(result != 0){entropyxy += (log(2*result));}
		   printf("entropyxy: %f\n", entropyxy);
		   
	
	}

what I need help with is the first bit. As it stands I have only worked with 3 points. But what I want to do is use a data file as my input. This data file contains two columns: first is x values, the second column are all the y values. Each row thus makes 1 point.

So I have a N x 2 array in my input file, where N is the number of points. How do I use this as my input rather than just the 3 points I have in my code at the moment? I understand that I will have to loop over N, and scanf each in but am having trouble with the syntax.

Any help would be appreciated,

Thanks

zesty

**brewbuck** · 12-01-2009

Many points are so far away that you do not have to calculate the actual distance to prove that the point is not the closest point.

Suppose you have the current closest distance to a neighboring point in variable d. Then, if you are considering a new point with x distance dx and y distance dy, then if dx >= d or dy >= d, you know that this point cannot be closer, and you can avoid squaring numbers and taking square roots.

**slingerland3g** · 12-01-2009

Originally Posted by brewbuck

Many points are so far away that you do not have to calculate the actual distance to prove that the point is not the closest point.

Suppose you have the current closest distance to a neighboring point in variable d. Then, if you are considering a new point with x distance dx and y distance dy, then if dx >= d or dy >= d, you know that this point cannot be closer, and you can avoid squaring numbers and taking square roots.

There is a reason behind squaring. Will your solution hold true for all points across any quadrant?

**anon** · 12-02-2009

I don't see why it wouldn't. You can work with squared values and only compute the square root only once.

It doesn't matter if you compare the squared distances (x1 - x2) ** 2 + (y1 - y2) ** 2 or the square roots of it. Similarly, the distance is not going to be smaller, if either (x1 - x2) ** 2 or (y1 - y2) ** 2 is larger than the currently smallest distance squared.

With 1000 points, the time it takes to calculate is almost immeasurable anyway, but with more points, adding this optimization indeed reduces the time a bit (I'm seeing about 1/5 - 1/6 speed-up).

**vart** · 12-02-2009

Originally Posted by brewbuck

Many points are so far away that you do not have to calculate the actual distance to prove that the point is not the closest point.

Suppose you have the current closest distance to a neighboring point in variable d. Then, if you are considering a new point with x distance dx and y distance dy, then if dx >= d or dy >= d, you know that this point cannot be closer, and you can avoid squaring numbers and taking square roots.

if you know current "optimal" differencies DeltaX and DeltaY you can also immediatle discard all point that have (dx>=DeltaX && dy >= DeltaY) || (dx >= DeltaY && dy >= DeltaX) and thus decreasing even more the number of points to be actually checked using full metrics

Thread: finding nearest neighbour distance

Thread Tools

Search Thread

Display

Similar Threads

For the numerical recipes in C types!

Distance Formula in my program..... I need help fast!!!

shortest path problems

Distance Formula Implecations: Urgent!

Fuzzy Logic