Thread: Matching string by a percentage?

  1. #1
    Registered User
    Join Date
    Oct 2003
    Posts
    104

    Matching string by a percentage?

    Hi there, haven't been coding for a while, but I think I should start back as I have been away from it for too long.

    I have a set of files in two different directories, say DirA, and DirB. The problem is there maybe duplicate files in the directories, and the file name for a file may not exactly be spelt the name way, e.g. have a couple additional spaces, or maybe an apostrophe instead of an underscore.

    I really can't remember what is the term used to compare strings for possible matches (I think one existed sometime, possibly not Cpp though), anyone know of anything I'm talking about? Reading up on that, should help me makeup an algorithm as to how I am going to delete the duplicate files.

    By the way, I did a dir dump of all the files in each directory and removed the extensions, so all I have to do now is figure out how to match the duplicates (file name strings ) in the directories, and then output those names into a text file and then do a batch delete.

  2. #2
    Registered User OnionKnight's Avatar
    Join Date
    Jan 2005
    Posts
    555
    Fuzzy string matching/searching?

  3. #3
    Registered User
    Join Date
    Apr 2007
    Posts
    1

    Levenshtein distance algorithm

    The Levenshtein distance algorithm is probably what you are looking for. It measures the minimum number of transformations (insertions, deletions or substitutions) needed to turn one word into another. Eg cat to caps would have a Levenshtein distance of 2 (change t to p, then add s). The "percentage" change could then be 50% (2 / len(caps)). See http://en.wikipedia.org/wiki/Levenshtein_distance

  4. #4
    Registered User
    Join Date
    Oct 2003
    Posts
    104
    @nanothief : Thanks, I never heard about that, I would look into it.

    @OnionKnight : Why does your profile picture section take up half the space while the body of your post takes up the other half? Anyway, back to your reply; I'm not sure if that is the phrase I remember, but I guess that could work also.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 8
    Last Post: 04-25-2008, 02:45 PM
  2. String Class
    By BKurosawa in forum C++ Programming
    Replies: 117
    Last Post: 08-09-2007, 01:02 AM
  3. RicBot
    By John_ in forum C++ Programming
    Replies: 8
    Last Post: 06-13-2006, 06:52 PM
  4. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 03:23 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM