Every possible combination/best combination

• 07-18-2007
ldb88
Every possible combination/best combination
I am given a 5x5 matrix of characters, each character having a certain value. I need to find the best (highest sum of the values of the included characters) 3 words that can be made. Any number of letters is fine, so long as it is a word. I have a word list (will probably be read from a file into memory as a hash table) to check to see if the series of characters (string) is a word, so checking to see if a string is valid is not a problem. The hard part is that you can start at any place in the matrix, but you can only move to adjacent (or diagonal) cells. For instance, if the matrix was
Code:

```A B C D E F G H I J K L M N O P Q R S T U V W X Y```
you could use "MNOTS" but you could not use "GSEJ." I know those aren't really words, but you understand what I mean I hope. Also, once you use a letter, it is removed, and you cannot use that letter any more. That spot in the matrix is left blank (so each adjacent/diagonal matrix now has one less option for the next letter). Note that a character can occur in the matrix multiple times (in different cells), but only the one that is used is removed from the matrix. All of this (except the loading of the word list) needs to happen in about 15 seconds on a standard PC.

I have thought about implementing the matrix as a graph (adjacency matrix) and check the links to see which are the possible next letters. This would make it easy to remove a letter from the matrix after it has already been used, and to keep up with which letters can be used next. I thought about making another data structure that will hold all the possible combinations of letters which are in the word list (without removing anything from the matrix), then sorting it (or using an AVL tree and traversing from high to low values) based on the values of the words and finding which 3 were the best that were compatible (did not use the same cell).

My problem is how to get all the possible combinations of characters, especially in a reasonable amount of time. Since any number of characters can be used in a possible combination, that will have to be taken into consideration as well. Once I get the valid combinations, what would be the best way to go about determining the 3 best compatible ones?

If anybody can think of any other way to do this, please let me know. This all seems very time consuming, and I'm not sure that it would work in the short period of time it has to work in (only a few seconds). I am new to graphs, hashes, and trees, so this probably isn't the best way to go about this. But that's why I'm asking.
• 07-18-2007
manofsteel972
Just curios but if you are at the edge do you wrap around or are the edges considered a boundry?
• 07-18-2007
robatino
You could check each word in the list to see if it appears in the matrix. Keep a running list of the 3 highest-scoring words that have currently been found. This will probably work in under a second.
• 07-18-2007
ldb88
Quote:

Just curios but if you are at the edge do you wrap around or are the edges considered a boundry?

It might be a good idea to limit the size of the string to say 10 characters just to make the combinations faster to find, assuming I do it that way.

Quote:

You could check each word in the list to see if it appears in the matrix. Keep a running list of the 3 highest-scoring words that have currently been found. This will probably work in under a second.
I could use a linked list to implement the word list and sort it before the timer starts, so that the best 3 could be found faster. Or I could still use the hash table and use chaining for the collisions (the hash index being the value of the string/sum of the values of the characters). I could just put the ones which can be made from the matrix into a sorted data structure (probably a linked list). Is this what you were referring to? If not, please clarify. As I understand it, I think this would be much faster than the way I had thought up.

As far as I can tell, all of the ways I have thought of would require me to find the best 3 strings that are compatible in a data structure (probably a sorted linked list containing words that are found in the matrix). How would I do this?
• 07-18-2007
robatino
If the word list is fixed, then you just sort it once, and then store it in that order so you never do it again. You can read them into an array or vector of C++ strings, and keep a running list of the indexes of the 3 words with the highest current score. No need for anything as complicated as a hash table.
• 07-18-2007
ldb88
Ok that makes sense. But how will the inability to use the same cell twice factor in? If I just get the best 3, they will likely use the same cell more than once (be incompatible). Also, if I just select the best word overall (index 0) as the first word, find the best word that is compatible with it and make that the 2nd word, and then find the best word that is compatible with those two and make this the 3rd word, that won't necessarily mean that I have the best possible combination, will it? Maybe instead of using indice 0,5,7 I should have used 1,2,3. Using 0,5,7 would likely give good words (or one really good word and 2 decent words), but not necessarily the best combination of words. Am I correct in saying this?

So how do I go about finding the best combination of 3 compatible words from the array or vector?
• 07-18-2007
robatino
I hadn't realized that the letters used in each word can't be used for the other two. That IS more complicated, but one thing you could do is check each word individually, find the words that individually occur in the matrix (all entries usable), and make a list of those, along with the score for each. That list will be much shorter than the original word list. Then you could try to find combinations of 3 compatible words from this much shorter list.

Edit: Reading the specification again I realize now that the characters in the matrix could each occur multiple times. This means that a word might occur in the matrix in more than one combination, and you can't just assume that the three words use disjoint sets of letters. So what I described above probably won't work.
• 07-19-2007
ldb88
Instead of just using plain strings, I could create a structure like this:
Code:

```struct word{     string theWord;     unsigned int* indice;     unsigned int score; };```
and dynamically allocate memory for indice (indice = new unsigned int[theWord.size()]). Then I could check the indice to make sure that they aren't used twice instead of checking the letters to make sure they arent used twice. I think this would be a workaround for the characters occurring multiple times problem.

I am willing to accept that only one sequence of indice will be found for each word, even if there are more paths you could take through the matrix to make the same word. This shouldn't happen unless the best word is only like 3 characters long. Otherwise, it is unlikely that there will be more than one sequence of indice that make up the same string of characters.

Do you think this is feasible?
• 07-19-2007
manofsteel972
This whole thing sounds very much like the game of Boggle. If this is Boggle, then the restriction for using letters once is only within the same word.
• 07-19-2007
robatino
> If this is Boggle, then the restriction for using letters once is only within the same word.
If true, then each word can be processed independently. The processing is a little messier than I thought, but not too difficult. One can use a depth-first search to check if each word occurs in the matrix.
• 07-19-2007
iMalc
Hmm, your description sounds remarkably identical to the game "Boggle".

So you want to write AI for boggle huh?
Boggle also has a 'Qu' on one side of one of the dice. Does your game also allow for that?

Anyway, down to algorithms and data structures. My preference for the word list would be a multi-way trie. Probably a 26-way trie to be more specific. Should make for very fast word validity checking.
Once you've got that down, you need something that Gives you all combinations of letters that agree with the rules for connecting letters etc.
For that, I'd recommend a recursive function, and perhaps a list as one of the parameters. That list would contain the positions used so far, so as to not use them twice.
You would call it 25 times, once for each possible starting position, passing a list containing only that position.
The recusrive function would look like this in pseudocode:
Code:

```findWords(usedPositionsList) {   checkForRealWord(usedPositionsList)   currentSpot = usedPositionsList.front()   for i = -1 to 1     for j = -1 to 1     {       newSpot = (currentSpot.x+i, currentSpot.y+j)       if newSpot == currentSpot         continue       if not spotIsValid(newSpot)         continue       if spotAlreadyInList(usedPositionsList, newSpot)         continue       usedPositionsList.push_back(newSpot)       findWords(usedPositionsList)       usedPositionsList.pop_back()     } }```
There, all solved. You just have to write:
checkForRealWord, which looks up the word in the multi-way trie,
spotIsValid, which simply tells you if the coordinates are in the range 0..4,
spotAlreadyInList, which checks that the spot is not in the list already.
Oh and of course, translate pseudocode to C++.
• 07-19-2007
ldb88
The problem is VERY similar to a game of Boogle, but it is a little different. Yes, 'Qu' is involved. I was planning on using a character other than 'a' to 'z' to represent 'Qu' (such as '1'). But one way it differs from what manofsteel972 described is that if you use a cell for one word, you cannot use it for the others:
Code:

```A B C D E F G H I J K L M N O P Q R S T U V W X M```
If you use M (the center one) for the first word, you cannot use it for words 2 and 3, nor can you use it twice in word 1. You can, of course, use the M in the corner.

Quote:

Then you could try to find combinations of 3 compatible words from this much shorter list.
I believe that is all that is left to do. I think if I could figure out how to find the best 3 compatible words from a sorted list it would work. Everybody seems to be trying to find a way around this problem. Is that because it would be very difficult to solve or is that because it would be very easy to solve and I'm just not seeing it? Finding out whether three words are compatible is a piece of cake, but the problem is finding the 3 compatible words with the highest scores.
• 07-19-2007
iMalc
You're right, when I read your original description I must have conveniently omitted from my memory the little part that wasn't exactly the same as Boggle. (Oh and those other posts crossed with mine)

However I still think that a modification of what I proposed is the way to go. In fact if you replaced checkForRealWord with checkFor3RealWords then it would do exactly what you want.
The trick then is that checkFor3RealWords would have to check that the list describes positions that make exactly 3 real words with no characters leftover. This function in turn could call the original checkForRealWord 3 times etc... (okay might not be quite as simple as that, but that would get you most of the way there)

Also, I Wouldn't bother replacing Qu with a different character since Q will never appear without a U. Instead I would strip out the U following each Q from the word list as you read the words in. Hey you even save memory not storing the U!

I don't think that the way you want to do it is the way to do it. It is essentially working backwards which is generally much harder.