# Thread: finding nearest neighbour distance

1. ## finding nearest neighbour distance

hi,

my problem is as follows:

I have a list of N pairs of coordinates labelled (x[i],y[i]) i=1,...,N.

What I want to do is to take every point (x[i],y[i]) and find the Pythagorean distance to it's nearest neighbour.

So I have to loop over all the points, and then take that minimum distance for each point.. the distance will be given by

sqrt((x-x1)^2 +(y-y1)^2)

I am new to C and have no idea how to do this!

Any help would be appreciated

thanks a lot

mo786

2. I would start making a struct "point" which has two members: x and y (floating point numbers).
Then you proceed with making an array of "points".

I hope this helps you to start...

3. I assume that x[] and y[] are arrays. In which case you have N points. Each point (i) has the coordinate (x[i], y[i])
Find the distance of its nearest neighboor? Do you just mean find the nearest neighboor? I mean you cannot know which is the nearest if you don't calculate distances.

The algorithm is as follows in pseudo-code. Assume sqrt and pow (power)functions
Code:
```for(i : 1...N)
distance = sqrt(pow(x[i], 2) + pow(y[i], 2));
store distance;```
You can store the distance in a 2D array with a NxN size. The element on (i,j) position in the array be the distance between the points i and j.

Post what code you have until now

4. First some questions and an observation from what has been posted already:
1. What do you plan on doing with the set of nearest points once you have it?
2. How many points can you have?
3. Do you have any experience building any kind of binary trees?
4. How is it that you need to be solving such a potentially relatively complex thing in C when you're new to the language?

There's no need to use a sqrt at all here. Just compare distances-squared.
sqrt(a) < sqrt(b) if-and-only-if a < b

5. When you plot your 3 points, for your pythagorean theorem to work, these points must form a right triagle (90 deg). I guess your requirements or needing you to calculate the destance of the hypotenuse and use this as your distance measurement? There is a lot more at play here unless the points given in your array are guaranteed to form right angled triangles.

6. For the distance between two points, you don't need a third point.
If you take two perpendicular axes X and Y, then the points (x,y) and (x1, y1) are at distance:

sqrt((x-x1)^2 +(y-y1)^2): this formula is based on Pythagoras, but the right angle comes from the angle between the axes.

So, zesty was right with his formula.

7. There is a lot more at play here unless the points given in your array are guaranteed to form right angled triangles.
i was thinking this also, there must be a rule in the problem that is not being explained to allow for the third vertice

8. Here is my attempt for a function that takes the distance between two points:

Code:
```#include <stdio.h>
#include <math.h>

struct point{
double x;
double y;
};

typedef struct point point;

double distance (point a, point b);

int main(){
point a = {1.2, 2.3};
point b = {3.0, 3.1};
printf("%f\n", distance(a,b));
return 0;
}

double distance (point a, point b){
return  sqrt((a.x - b.x) * (a.x - b.x) + (a.y - b.y) * (a.y - b.y));
}```
for an explanation of the formula: LOOK HERE

9. Originally Posted by iMalc
# Do you have any experience building any kind of binary trees?
hmm... ignoring possible answers to your first question for the time being, what use will a binary tree be here? I had the impression that at least N(N+1)/2 (effective) distance computations are needed since the aim is to find the distance to the nearest neighbour for each point.

Originally Posted by iMalc
There's no need to use a sqrt at all here. Just compare distances-squared.
That will work to reduce the amount of computation needed to find the nearest neighbour, but eventually one must take the square root to actually obtain the distance.

10. hmm... ignoring possible answers to your first question for the time being, what use will a binary tree be here? I had the impression that at least N(N+1)/2 (effective) distance computations are needed since the aim is to find the distance to the nearest neighbour for each point.
Perhaps he means something like a quad-tree. While I haven't used such a thing myself, I understand the point is that it allows you to find points close to somewhere. Checking everything against everything else could be expensive (e.g for collision detection in a game).

However, in this excercise, I suppose a N^2 algorithm is acceptable and expected.

That will work to reduce the amount of computation needed to find the nearest neighbour, but eventually one must take the square root to actually obtain the distance.
Yes, but that will be N computations of square root instead of N*(N-1) / 2.

11. I think this program takes the shortest distance between a list of points. My list is a simple array of points. The point is a simple struct with 2 members: x and y coordinate.

Code:
```#include <stdio.h>
#include <math.h>

struct point{
double x;
double y;
};

typedef struct point point;

double distance (point a, point b);

int main(){
int i = 0, j = 0;
point myPoints[5] = {{1.0, 2.0}, {1.0, 4.0}, {2.0,3.0}, {1.0, 3.0}, {3.0, 7.0}};
double result = distance(myPoints[0], myPoints[1]);
printf("%s%f\n", "eerste: ", result);
for (i = 0; i < 5; i++){
for (j = 0; j < 5; j++){
if (i == j)
continue;
if (result > distance(myPoints[i], myPoints[j]))
result = distance(myPoints[i], myPoints[j]);
printf("%d %d\n", i, j);
printf("%f\n", result);
}
}

return 0;
}

double distance (point a, point b){
return  sqrt((a.x - b.x) * (a.x - b.x) + (a.y - b.y) * (a.y - b.y));
}```
I compare the counters i and j, because it doesn't make sense to compare the distance between a point and itself. So in case i == j, the program continues with a next j.
If the new distance is smaller than the result so far, the result gets this smaller value.
At the end of the for-loops, result has the smallest distance.

12. Some minor optimizations
Code:
```int dis;
....
//in for-loop
dis = distance(myPoints[i], myPoints[j]);
if  (result > dis)
result = dis;```
dis will be a variable so you won't calculate twice distance

Code:
```double distance (point a, point b){
int t1 = a.x-b.x;
int t2 = a.y-b.y;
return sqrt(t1*t1, t2*t2);
}```
so you avoid computing a.x-b.x and a.y-b.y twice. Though, the compiler would probably optimize the code for you anyway, just thought to point it out

13. Originally Posted by C_ntua
Some minor optimizations
Code:
```int dis;
....
//in for-loop
dis = distance(myPoints[i], myPoints[j]);
if  (result > dis)
result = dis;```
dis will be a variable so you won't calculate twice distance

Code:
```double distance (point a, point b){
int t1 = a.x-b.x;
int t2 = a.y-b.y;
return sqrt(t1*t1, t2*t2);
}```
so you avoid computing a.x-b.x and a.y-b.y twice. Though, the compiler would probably optimize the code for you anyway, just thought to point it out

14. Originally Posted by laserlight
hmm... ignoring possible answers to your first question for the time being, what use will a binary tree be here? I had the impression that at least N(N+1)/2 (effective) distance computations are needed since the aim is to find the distance to the nearest neighbour for each point.
If the number of points is very large then building a 2D-Tree or Quad-Tree for example would allow one to find the nearest neighbour in less than O(n*n) time.
Sure I could have waited to get an answer to how many points there are first before asking that one, but as we can see it's taking long enough to get a second post as it is.

That will work to reduce the amount of computation needed to find the nearest neighbour, but eventually one must take the square root to actually obtain the distance.
Yes at that step it would be needed, but not before.

15. thanks everyone for all your help! sorry it has taken me so long to reply..

at the moment, i am simply looking for a computation of the order of N^2 where N is the number of points labelled (x(i),y(i)) that I have. Once I have done this, then I will investigate ways to improve my algorithm to lower orders, but just want to deal with the simplest case at the moment.

As, I only have a 2D array, the distance between any two points is of the form as I wrote in my original post. My aim was to take each point, compute the DISTANCES to every other point, and then take the minimum of these distances in order to find it's nearest neighbour. The value of this minimum distance would then be substituted into a seperate formula.

All your posts have been a great help, thanks so much!

zesty
xx