[Math] Cosine Similarity problem
1. Sorry if this is a math related question in the C board, but this forum lacks a "General Programming" board!
2. I am currently working on a decision supporting script that takes a number of user inputs and outputs a technology that is best suited to the user needs. In short, I have a huge list of criteria and a list of technologies with their correspondence to these criteria. The user choses the criteria he is interested in, choses the value of the criteria and the script tells him which technology suits his needs.
I didn't want to build a decision tree because of the high granularity of possible values a decision can take and the horrible inefficiency of node duplication (actually, the reasoning about why I didn't take a decision tree fills 3 pages in my report) so I decided to chose something else: Cosine Similarity.
For those of you that don't know what it is, it's the cosine value between two vectors in an N-dimensional space. I'll use the euclidean distance for illustration purpose:
The technologies are represented by a vector, where each component is the correspondence to a certain criteria.
The user input is his interest in certain criteria.
The result of the script is the criteria that minimizes the euclidean distance.
Up to this point, everything is working fine. I additionally want to give each criteria an importance. I've chosen values between [1, 10] since that allows me to immediatly modify the euclidean distance formular to take into account important:
Technology 1: [0.5, 0.5, 1]
Technology 2: [1, 0, 0.5]
Input: [1, 0.5, 0.5]
Distance 1: 0.5 + 0 + 0.5 = 1
Distance 2: 0 + 0.5 + 0 = 0.5
Output: Technology 2
dist = sum(imp(i) * abs(x(i) - y(i)))
A high important factor gives a much higher impact to even small differences, by multiplying the distance by a bigger number. A small difference of "0.1" but with an important of 10 will add "1" to the total distance instead of the unimportant 0.1
What I can't figure out is to how to express the exact same thing with the cosine similarity. I tried modifying the formula in several ways, but each try failed.
I know it's not entirely a programming related question, but since computer scientist usually had courses in mathematics, I thought I would try anyway.