# [Math] Cosine Similarity problem

• 04-16-2007
KONI
[Math] Cosine Similarity problem
1. Sorry if this is a math related question in the C board, but this forum lacks a "General Programming" board!

2. I am currently working on a decision supporting script that takes a number of user inputs and outputs a technology that is best suited to the user needs. In short, I have a huge list of criteria and a list of technologies with their correspondence to these criteria. The user choses the criteria he is interested in, choses the value of the criteria and the script tells him which technology suits his needs.

I didn't want to build a decision tree because of the high granularity of possible values a decision can take and the horrible inefficiency of node duplication (actually, the reasoning about why I didn't take a decision tree fills 3 pages in my report) so I decided to chose something else: Cosine Similarity.

For those of you that don't know what it is, it's the cosine value between two vectors in an N-dimensional space. I'll use the euclidean distance for illustration purpose:

The technologies are represented by a vector, where each component is the correspondence to a certain criteria.
The user input is his interest in certain criteria.
The result of the script is the criteria that minimizes the euclidean distance.

Illustration:

Code:

```Technology 1: [0.5, 0.5, 1] Technology 2: [1, 0, 0.5] Input: [1, 0.5, 0.5] Distance 1: 0.5 + 0 + 0.5 = 1 Distance 2: 0 + 0.5 + 0 = 0.5 Output: Technology 2```
Up to this point, everything is working fine. I additionally want to give each criteria an importance. I've chosen values between [1, 10] since that allows me to immediatly modify the euclidean distance formular to take into account important:

dist = sum(imp(i) * abs(x(i) - y(i)))

A high important factor gives a much higher impact to even small differences, by multiplying the distance by a bigger number. A small difference of "0.1" but with an important of 10 will add "1" to the total distance instead of the unimportant 0.1

What I can't figure out is to how to express the exact same thing with the cosine similarity. I tried modifying the formula in several ways, but each try failed.

I know it's not entirely a programming related question, but since computer scientist usually had courses in mathematics, I thought I would try anyway.
• 04-16-2007
Salem
> 1. Sorry if this is a math related question in the C board, but this forum lacks a "General Programming" board!
It it isn't programming, and it isn't "GD Wibble", then use the "tech" board - moved.
• 04-16-2007
Happy_Reaper
So I think so far, you can do the cosine similarity fine (as it's just calculating the cosine between two vectors), but you want to incorporate importance into that equation. To be honest, I have no tried and true solution, but what I'd do, is I'd first calculate the cosine similarity. That would give me the angle between the input vector and the technology vector. After that, I'd calculate the "importance" index of the technology vector, where :

imp_index = sum(imp[i] * val[i])

And then I'd divide the angle by the importance index.

I'm not quite sure this is analoguous to your euclidean example, but I believe this would do what you want.
• 04-16-2007
SevenThunders
The metric you are using is closely related to the normalized mean square error defined by

min_alpha || a * alpha - b ||^2 / || b ||^2 where we minimize over all possible scalars alpha that scale a to look like b and then normalize with respect to the reference vector b. It's easy to see that this is the sin(theta)^2 of the angle theta formed between the vectors a and b. Obviously this is 1 - cos(theta)^2.

A natural extension would be to use a weighted square norm. Let G be the diagonal weighting matrix. Your modified norm looks like,

||a||_G = a' * G * a, where a' is the transpose of a. This is entirely equivalent to replacing a with sqrt(G) * a, prior to taking the norm. Thus if we were to use the modified norm it suffices to replace a with sqrt(G)*a and b with sqrt(G)*b and then proceed with the computation of the normalized norm or more directly the cosine of the angle between the two vectors.
• 04-16-2007
brewbuck
Quote:

Originally Posted by KONI
A high important factor gives a much higher impact to even small differences, by multiplying the distance by a bigger number. A small difference of "0.1" but with an important of 10 will add "1" to the total distance instead of the unimportant 0.1

What I can't figure out is to how to express the exact same thing with the cosine similarity. I tried modifying the formula in several ways, but each try failed.

Don't modify the formula. Modify the data flowing into it. In general, you are looking for a transformation of the data vector that produces new features whose relative importance is more closely matched.

The simplest such method would be to scale each feature independently according to some subjective idea of how important it is. This is equivalent to choosing a transformation with non-zero values only along the diagonal of the matrix. But you could use something more complex in order to account for possible dependencies between features.
• 04-17-2007
KONI
Thanks a lot for the input. The main difference between multiplying the difference between two components by a certain factor and modifying either the data or the formula in the cosine similarity is the following:

The cosine calculates the angle between two normalized projections of vectors onto their axis. If I modify that data by applying a "weight", I modify the position of that point on the axis. This doesn't influence one bit the importance of that vector onto the global result but rather the correctness of the result.

I tried modifying the query vector and/or the corpus vectors to take into account a certain weight, but rather than emphasis a certain criteria, it modifies the result by changing the direction of the vector.

@Seventhunders:
What you say makes a lot of sense and I would love to give it a try. The problem is that, even if what you say works, it is computationally to expensive in regard of the two other methods I've found. When I asked that question in the science.math newsgroup, the very intelligent answer I've got was "if you can do it with the block distance, why try and do it with the cosine similarity" ... and he was right. Why try to go through so much trouble to get the cosine to work if I can simply use the euclidean/block distance.

I've attached a small part of the document I'm working on and if you have anything to add, I appreciate your constructive input.
• 04-17-2007
brewbuck
Quote:

Originally Posted by KONI
The cosine calculates the angle between two normalized projections of vectors onto their axis. If I modify that data by applying a "weight", I modify the position of that point on the axis. This doesn't influence one bit the importance of that vector onto the global result but rather the correctness of the result.

I wasn't talking about changing the importance of vectors (I'm not sure what that even means). I meant to apply scaling to the particular elements of the vector. The idea being to cast the vectors into a new space where the cosine similarity measure gives better results. Finding the best transformation is going to take more effort than just trying a few values you think will work.

In other words, you are computing the cosine similarity between T(A) and T(B), instead of between A and B.