# Thread: Reinforcement Learning, ConnectFour game

1. ## Reinforcement Learning, ConnectFour game

 Hello! I've tried recently to solve the ConnectFour game using reinforcement learning (SARSA algorithm). I'm familiar with this method, but I have a big problem with the state space of this game. I identify a table board as a matrix (with 6 rows and 7 columns) and I need a method to classify similar matrix in order to minimize the state space (otherwise the learning algorithm will not learn). I tried with an artificial neural network. In fact, this is a back-propagation network with 126 input nodes (3 for every cell in the matrix -> coresponding to Red,Black or free cell). In the hidden layer I put 84 neurons and the output layer has 42 nodes. Unfortunately, to work with back-propagation I need to know the classed for the training examples, but I don't and the classification is very poor. Maybe you can help me with some hints. Also I want to know whether this ANN is good enough and what other methods can I use to classify the states (in order to minimize the state space)? Thanks a lot, MrRed

2. What does the storage have to do with the classification algorithms? Are you storing the "ANN" state directly?

How are you actually storing the data? You didn't actually say anything about the data structures you are using to store data.

You tried to classify a game of "Connect 4" with "ANN"? What state do you need to store? How "fuzzy" is the output?

Why does your output layer have 42 nodes? "Connect 4" is influenced by gravity; you play a column and the piece falls to the lowest cell.

Soma

3. I need classification in order to use the SARSA algorithm (reinforcement learning).
I use ANN to classify the states (of course, similar states), because, in naive version, there are 4.5T possible states (where T = 10^12). I said before...with so many states, SARSA can not learn anything.
For storing data (in fact the table board) i use a matrix with 6 rows and 7 columns, that's why the network has 42 output nodes.
But maybe this is not the best aproach...and I need some hints.

4. O_o

You misunderstood my questions. You didn't say anything new. I'm trying to get more information out of you so that I can help you do it instead of simply doing it for you. I'm not going to do it for you. (Call it a personal fault.) So, we kind of have an issue here.

If and when you come back, I am, again, looking for new information. You said you "use a matrix with 6 rows and 7 columns." Are you, for example, simply storing an array (`float mData[6][7];') for every "ANN" result?

Again, I can't help without knowing more about what you are actually doing!

Soma

5. Yes, I use an array...I have 42 nodes (6x7)...don't know what is so hard to understand. I do not want code from you, I only want some ideas. Again, I'm not saying that ANN will solve this problem, but I know this kind of network can classify things...or maybe we can find a surjective function to do the minimization of game boards

6. I'm not at all sure exactly what you're asking - it seems infinitely more to do with AI than anything to do with C at all, and AI isn't my area. But:

Wouldn't the minimisation of the game board just be the list of moves made by column number? At most 42 moves will be made so for the very last (filled up) board you will do no worse than an array storing the state of the board, but in ALL OTHER CASES, you'll actually have a smaller state (that of the list of moves up to then - each integers from 1 to 7, and never more than 42 of them - which would come out to 21 integers per game board stored on average). With a bit of "compression" you'd could get it down into 3 bits to describe each move (and thus never take more then 21 bytes total to store any board/game, and 11 bytes total per board position on average).

Not to mention that boards can be horizontal mirrors of each other, but that's quite tricky to detect if you only store the moves (and probably the pruning of the game tree by this sort of detection would cost more than it would save).

And, to my knowledge, classic Connect 4 is solved (since about the 80's?) and there are myriad papers on it that would go into all sorts of classifications of similar positions for you.

7. I appreciate your answer but it's the same thing...I have no problems with the memory (I do not need compression or stuff like that). SARSA can not learn because there are too many states.

8. Originally Posted by MrRed
I appreciate your answer but it's the same thing...I have no problems with the memory (I do not need compression or stuff like that). SARSA can not learn because there are too many states.
Which is probably why SARSA isn't used for games like connect 4, chess, checkers, etc.

Why don't you swing by the talkchess or openchess programming forums? That's where the chess programmers mostly talk shop, and you can get a few opinions on it from them. These are high-level programmers, so bring your geek hat, definitely!

9. Originally Posted by MrRed
In fact, this is a back-propagation network with 126 input nodes (3 for every cell in the matrix -> coresponding to Red,Black or free cell).
I would call that a problem. Whatever algorithm you use should not be capable of representing an invalid board state. If it is then you're forcing it to learn both how to play well and what a valid board state is simultaneously. You'd also fail to teach it how gravity affects the placement of pieces.
By using a representation where valid state information, and the affect of gravity are built into the model, you'll have much less for it to actually learn.

So accounting for gravity, there are 1 + 2 + 2*2 + 2*2*2 + 2*2*2*2 + 2*2*2*2*2 + 2*2*2*2*2*2 = 127 possible states for each column, not 3*3*3*3*3*3 = 729, and even that's still assuming we don't stop after a win. It might at first seem like a small difference, but after working it out, clearly you're forcing it to learn based on more than 5 times the amount of actual information held within the valid board states.

Also, without personally knowing anything about this SARSA algorithm, I can surmise that it will need to learn the relationship between the column it chooses, the board state at the time it chose, and whether the move turned out to be good or bad. The output from the algorithm will thus be just the column choice, not a choice from a 6*7 grid. So in that sense you're probably asking it to produce 6 times more output data than is required.

My background in AI is more along the lines of genetic algorithms and minimax, rather than this particular algorithm, so take this with a grain of salt... Lets just say that I believe I can somewhat see why it isn't working.

10. Originally Posted by iMalc
So accounting for gravity, there are 1 + 2 + 2*2 + 2*2*2 + 2*2*2*2 + 2*2*2*2*2 + 2*2*2*2*2*2 = 127 possible states for each column, not 3*3*3*3*3*3 = 729, and even that's still assuming we don't stop after a win. It might at first seem like a small difference, but after working it out, clearly you're forcing it to learn based on more than 5 times the amount of actual information held within the valid board states.
Could you explain how did you calculate that possible states?
Originally Posted by iMalc
Also, without personally knowing anything about this SARSA algorithm, I can surmise that it will need to learn the relationship between the column it chooses, the board state at the time it chose, and whether the move turned out to be good or bad. The output from the algorithm will thus be just the column choice, not a choice from a 6*7 grid. So in that sense you're probably asking it to produce 6 times more output data than is required.

My background in AI is more along the lines of genetic algorithms and minimax, rather than this particular algorithm, so take this with a grain of salt... Lets just say that I believe I can somewhat see why it isn't working.
Well...generally, SARSA takes a matrix representing the board and returns a column which is the best move for that board. To do that, SARSA needs a smaller count of states...that's why I tried to get 6*7 outputs.

11. Originally Posted by MrRed
Could you explain how did you calculate that possible states?
Sure.
That's 1 for an empty column.
Plus 2 for having one item in the column (red or black).
Plus 2*2 for having two items in the column (red or black for either one)
Plus 2^2 for having three items in the column (red or black for any of them)
...
Plus 2^6 for having six items in the column (red or black for any of them)
So from the above, evaluate sum of (2^N) where N goes from 0 to 6, which is equal to 2^(N+1) - 1 => 127

Although even that allows you to represent say 5 red, one atop another, which wouldn't occur in a real game since it would be game over before then. Also, once you take multiple columns into account. There might be a win of a length greater than four across columns. I'm not too worried about those cases as they are much harder to avoid representing.

Well...generally, SARSA takes a matrix representing the board and returns a column which is the best move for that board. To do that, SARSA needs a smaller count of states...that's why I tried to get 6*7 outputs.
Okay, well I might look up SARSA when I get home tonight, if I have time.

12. Well...generally, SARSA takes a matrix representing the board and returns a column which is the best move for that board. To do that, SARSA needs a smaller count of states...that's why I tried to get 6*7 outputs.
That isn't the general approach to SARSA and "Connect4"; as I already said, that approach is flawed.

Your network should be capable of correctly of reducing play to a column refinement. You only need the complete state of the board for later refinement, if you even do that, as a snapshot; you don't need the complete state of the board for input to the SARSA algorithm. The SARSA algorithm can't learn from mutating any point on the board because at any given time at most 7 plays can be made.

Soma

Popular pages Recent additions