I think what people are trying to say is that in order to use this function, you're going to have to learn some C. If you know nothing about arrays or pointers, it's going to be difficult for you to use this code. I can (and will) give you an example, but if you don't understand it you'll find it difficult to modify it for your own uses.

It would help if you posted your attempt, and the compiler errors you got. Then we could explain the problems and point you at what you need to learn about to get it right.

I should prefix this by saying I have no idea whatsoever what ' k-means clustering' is. Looking at the code and looking at wikipedia, it seems that you will have an input set of N data points with D dimensions, and you'll want to get out a set of C 'clusters' also of D dimensions. The function also gives you a list N items long of numbers indicating which point went into which cluster.

So here's the basics of the function - they've been unhelpful by not giving the arguments helpful names.

Code:

Parameters:
data: double** dimensions n x k. Means n pointers to n arrays of doubles of length k.
n: number of data points
m: dimensions
k: desired number of clusters
t: error tolerance
centroids: double** array, dimensions k x m, meaning k pointers to k arrays of m doubles
int *k_means(double **data, int n, int m, int k, double t, double **centroids)

Here's an example of how you *might* call it, if you wanted to enter the data on the command line.

Code:

int main(void)
{
int dimension, points, clusters;
int d, p, c;
double tolerance;
double temp;
int *labels;
double **centroids;
double **data;
printf("\nHow many data points? ");
scanf("%d", &points);
printf("Dimension? ");
scanf(" %d", &dimension);
printf("num clusters? ");
scanf(" %d", &clusters);
printf("error tolerance? ");
scanf(" %lf", &tolerance);
printf("\nenter data:\n");
data = malloc(sizeof(double*) * points);
/* Read data into array */
for (p = 0; p < points; p++)
{
data[p] = (double*)malloc(sizeof(double) * dimension);
for (d = 0; d < dimension; d++)
{
scanf(" %lf", &temp);
data[p][d] = temp;
}
}
/* Allocate space for results */
centroids = malloc(sizeof(double*) * clusters);
for (c = 0; c < clusters; c++)
{
centroids[c] = malloc(sizeof(double) * dimension);
}
labels = k_means(data, points, dimension, clusters, tolerance, centroids);
printf("\n");
for (p = 0; p < points; p++) {
printf("data point %d is in cluster %d\n", p, labels[p]);
}
printf("\n");
for (c = 0; c < clusters; c++) {
printf("\nCluster %d:\n", c);
for (d = 0; d < dimension; d++)
{
printf("%lf ", centroids[c][d]);
}
}
}

You can compile this with gcc with:

gcc test.c -o test -lm

The "-lm" tells the linker to link the maths library, which you need as k_means uses "pow".

Example run (data points borrowed from Iris flower data set - Wikipedia, the free encyclopedia)

Code:

How many data points? 5
Dimension? 4
num clusters? 2
error tolerance? 0.001
enter data:
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2
data point 0 is in cluster 0
data point 1 is in cluster 1
data point 2 is in cluster 1
data point 3 is in cluster 1
data point 4 is in cluster 0
Cluster 0:
5.050000 3.550000 1.400000 0.200000
Cluster 1:
4.733333 3.100000 1.400000 0.200000

If you want to read the data from a file, the code will be very similar.

If you're thinking that you can do this:

Code:

double data[5][4]= {{5.1, 3.5, 1.4, 0.2},
{4.9, 3.0, 1.4, 0.2},
{4.7, 3.2, 1.3, 0.2},
{4.6, 3.1, 1.5, 0.2},
{5.0, 3.6, 1.4, 0.2}};

Don't. It won't work. You can do this if you change the type of data in the function, but otherwise you can't.