Thread: [Math] Cosine Similarity problem

  1. #1
    Lean Mean Coding Machine KONI's Avatar
    Join Date
    Mar 2007
    Location
    Luxembourg, Europe
    Posts
    444

    [Math] Cosine Similarity problem

    1. Sorry if this is a math related question in the C board, but this forum lacks a "General Programming" board!

    2. I am currently working on a decision supporting script that takes a number of user inputs and outputs a technology that is best suited to the user needs. In short, I have a huge list of criteria and a list of technologies with their correspondence to these criteria. The user choses the criteria he is interested in, choses the value of the criteria and the script tells him which technology suits his needs.

    I didn't want to build a decision tree because of the high granularity of possible values a decision can take and the horrible inefficiency of node duplication (actually, the reasoning about why I didn't take a decision tree fills 3 pages in my report) so I decided to chose something else: Cosine Similarity.

    For those of you that don't know what it is, it's the cosine value between two vectors in an N-dimensional space. I'll use the euclidean distance for illustration purpose:

    The technologies are represented by a vector, where each component is the correspondence to a certain criteria.
    The user input is his interest in certain criteria.
    The result of the script is the criteria that minimizes the euclidean distance.

    Illustration:

    Code:
    Technology 1: [0.5, 0.5, 1]
    Technology 2: [1, 0, 0.5]
    
    Input: [1, 0.5, 0.5]
    
    Distance 1: 0.5 + 0 + 0.5 = 1
    Distance 2: 0 + 0.5 + 0 = 0.5
    
    Output: Technology 2
    Up to this point, everything is working fine. I additionally want to give each criteria an importance. I've chosen values between [1, 10] since that allows me to immediatly modify the euclidean distance formular to take into account important:

    dist = sum(imp(i) * abs(x(i) - y(i)))

    A high important factor gives a much higher impact to even small differences, by multiplying the distance by a bigger number. A small difference of "0.1" but with an important of 10 will add "1" to the total distance instead of the unimportant 0.1

    What I can't figure out is to how to express the exact same thing with the cosine similarity. I tried modifying the formula in several ways, but each try failed.

    I know it's not entirely a programming related question, but since computer scientist usually had courses in mathematics, I thought I would try anyway.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,662
    > 1. Sorry if this is a math related question in the C board, but this forum lacks a "General Programming" board!
    It it isn't programming, and it isn't "GD Wibble", then use the "tech" board - moved.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    So I think so far, you can do the cosine similarity fine (as it's just calculating the cosine between two vectors), but you want to incorporate importance into that equation. To be honest, I have no tried and true solution, but what I'd do, is I'd first calculate the cosine similarity. That would give me the angle between the input vector and the technology vector. After that, I'd calculate the "importance" index of the technology vector, where :

    imp_index = sum(imp[i] * val[i])

    And then I'd divide the angle by the importance index.

    I'm not quite sure this is analoguous to your euclidean example, but I believe this would do what you want.
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

  4. #4
    Registered User
    Join Date
    Apr 2007
    Posts
    141
    The metric you are using is closely related to the normalized mean square error defined by

    min_alpha || a * alpha - b ||^2 / || b ||^2 where we minimize over all possible scalars alpha that scale a to look like b and then normalize with respect to the reference vector b. It's easy to see that this is the sin(theta)^2 of the angle theta formed between the vectors a and b. Obviously this is 1 - cos(theta)^2.

    A natural extension would be to use a weighted square norm. Let G be the diagonal weighting matrix. Your modified norm looks like,

    ||a||_G = a' * G * a, where a' is the transpose of a. This is entirely equivalent to replacing a with sqrt(G) * a, prior to taking the norm. Thus if we were to use the modified norm it suffices to replace a with sqrt(G)*a and b with sqrt(G)*b and then proceed with the computation of the normalized norm or more directly the cosine of the angle between the two vectors.

  5. #5
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by KONI View Post
    A high important factor gives a much higher impact to even small differences, by multiplying the distance by a bigger number. A small difference of "0.1" but with an important of 10 will add "1" to the total distance instead of the unimportant 0.1

    What I can't figure out is to how to express the exact same thing with the cosine similarity. I tried modifying the formula in several ways, but each try failed.
    Don't modify the formula. Modify the data flowing into it. In general, you are looking for a transformation of the data vector that produces new features whose relative importance is more closely matched.

    The simplest such method would be to scale each feature independently according to some subjective idea of how important it is. This is equivalent to choosing a transformation with non-zero values only along the diagonal of the matrix. But you could use something more complex in order to account for possible dependencies between features.

  6. #6
    Lean Mean Coding Machine KONI's Avatar
    Join Date
    Mar 2007
    Location
    Luxembourg, Europe
    Posts
    444
    Thanks a lot for the input. The main difference between multiplying the difference between two components by a certain factor and modifying either the data or the formula in the cosine similarity is the following:

    The cosine calculates the angle between two normalized projections of vectors onto their axis. If I modify that data by applying a "weight", I modify the position of that point on the axis. This doesn't influence one bit the importance of that vector onto the global result but rather the correctness of the result.

    I tried modifying the query vector and/or the corpus vectors to take into account a certain weight, but rather than emphasis a certain criteria, it modifies the result by changing the direction of the vector.


    @Seventhunders:
    What you say makes a lot of sense and I would love to give it a try. The problem is that, even if what you say works, it is computationally to expensive in regard of the two other methods I've found. When I asked that question in the science.math newsgroup, the very intelligent answer I've got was "if you can do it with the block distance, why try and do it with the cosine similarity" ... and he was right. Why try to go through so much trouble to get the cosine to work if I can simply use the euclidean/block distance.


    I've attached a small part of the document I'm working on and if you have anything to add, I appreciate your constructive input.

  7. #7
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by KONI View Post
    The cosine calculates the angle between two normalized projections of vectors onto their axis. If I modify that data by applying a "weight", I modify the position of that point on the axis. This doesn't influence one bit the importance of that vector onto the global result but rather the correctness of the result.
    I wasn't talking about changing the importance of vectors (I'm not sure what that even means). I meant to apply scaling to the particular elements of the vector. The idea being to cast the vectors into a new space where the cosine similarity measure gives better results. Finding the best transformation is going to take more effort than just trying a few values you think will work.

    In other words, you are computing the cosine similarity between T(A) and T(B), instead of between A and B.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Need help understanding a problem
    By dnguyen1022 in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2009, 04:21 PM
  2. Memory problem with Borland C 3.1
    By AZ1699 in forum C Programming
    Replies: 16
    Last Post: 11-16-2007, 11:22 AM
  3. Someone having same problem with Code Block?
    By ofayto in forum C++ Programming
    Replies: 1
    Last Post: 07-12-2007, 08:38 AM
  4. A question related to strcmp
    By meili100 in forum C++ Programming
    Replies: 6
    Last Post: 07-07-2007, 02:51 PM
  5. WS_POPUP, continuation of old problem
    By blurrymadness in forum Windows Programming
    Replies: 1
    Last Post: 04-20-2007, 06:54 PM