Words in a Word
Hello! I am stuck and I was hoping someone here could help me. I have this text file of around 55000 words. What I want to do is find, for each word, as much words as can possibly be made by deleting or arranging the letters in that word.
A solution I was thinking is to parse the string and assign each letter to a char variable. Then arrange these characters in every possible way and compare the newly created words to my database to see if they exist. My problem is I do not know how to arrange these letters in every possible way. How can you do this? and if you have another solution please feel free to tell me.
Anagram detection is usually done by converting the things to be compared into a form that is ignorant of the letter ordering. One way to do this is by simply rearranging the letters into a single standard form, e.g. by sorting them. If two words have the same standard form, they are anagrams.
You also take your database and preprocess it by building a map from each standard form to all the words that reduce to it.
Finding every possible rearrangement is far too slow. We're talking about heat death of the universe too slow.
Deleting characters is also done on the standard form. When deleting from the standard form, be sure to skip redundant deletions, e.g. the standard form of anagram is aaagmnr - there's no difference in which a you delete, so only do it once. It may help you to use a standard form that encodes letters just once, remembering their count, i.e. the standard form becomes 3a1g1m1n1r.
Thanks CornedBee! Using anagrams seems to be a much better idea. Let me look into it and I'll post if something goes wrong.