View Full Version : Best method for agent learning?
Crazy Glue
01-05-2007, 09:45 AM
Ive made an artificial life simulator where agents begin with simple, randomly generated neural networks but develop strategies and complex behaviors through natural selection. However, they remain with their behaviors until they die and cannot change them. I was hoping to somehow modify the neural networks, use reinforcement algorithms, or do something so that they could adapt as they lived. The inputs right now are whether an herbivore/carnivore/plant is to their left/right/front/proximity, and their health status. Energy is gained by eating the type of food theyre supposed to. There is no way for herbivores to judge whether or not carnivores are dangerous right now since they will just be eaten, but i guess I could change it so that an herbivore has a certain probability of surviving. What would be the best way to implement adaptive learning for this kind of simulator?
Happy_Reaper
01-06-2007, 08:26 AM
What exactly is it that you would like to learn ?
Because as of now there doesn't seem to be that much of a need for AI. Are you trying to create some sort of uber animal ?
Crazy Glue
01-06-2007, 11:39 PM
lol this is just my senior project that im working on for highschool. I just want to add some method for agents to learn how to act on their own as they live.
Happy_Reaper
01-07-2007, 12:46 AM
How strong do you want it to be ?
And how much time are you willing to allocate to this thing ?
For something like a school project you could do a half decent agent just by making it prefer moves away from its predators and towards its food. And this can be done very quickly.
Crazy Glue
01-07-2007, 11:31 AM
Nah, this needs to be actual AI learning, not just probability actions. By the way, im sending this to MIT too, and i care more about them being impressed than my school. I have until the end of january to finish this thing for MIT. Since I now have study hall every day at school, I should have enough time to add some sort of AI learning. The basic simulator itself I finished in two weeks over the summer.
Happy_Reaper
01-07-2007, 12:56 PM
In that case, real learning usually starts with some sort of utility functions for actions, based on a certain number of factors. The learning, then, is tweaking the weights as you go along.
Crazy Glue
01-07-2007, 04:59 PM
I dont see how that would work though for any other agent action besides eating. How would an herbivore for example know that turning when a carnivore is infront of it would be a good thing? From what ive been reading, maybe reinforcement learning might be a better idea than neural networks. What do you think?
Perspective
01-07-2007, 09:29 PM
>Nah, this needs to be actual AI learning, not just probability actions
boy are you gonna be disapointed when you take your first AI class. Have a look at Bayesian Nets.. its just a bunch of probabilistic decisions where the "learning" updates the probabilities. Or Neural Nets, where training data defines the probabilities of activities of the nodes (neurons).
Happy_Reaper
01-08-2007, 07:16 AM
How would an herbivore for example know that turning when a carnivore is infront of it would be a good thing?
You could just put an extra factor which represents the "distance to closest predator", and the greater that would be, the better.
And even if you do neural nets, as Perspective said, you're going to be doing essentially the same thing.
boy are you gonna be disapointed when you take your first AI class
I'd agree with this one. I was quite dissapointed when I discovered that modern AI is so ridiculously simplistic.
Crazy Glue
01-13-2007, 07:03 PM
Ok, i looked for a while, and I tried using temporal difference learning with neural networks, kinda like in TD-Gammon. Heres the algorithm I used
1. The agent acts based on which output cell has the highest value
2. Store the agent's inputs when it acted
3. Set the reward equal to the difference in the agen'ts health from before it acted to its current health.
4. Repercieve the new state of the agent.
5. Store the new output cell with the highest value
6. Error = reward + learningRate * (new value of highest output cell) - (value of output cell from the agent's previous action)
7. Find the weights tied from any non-zero inputs of the agent when it acted to the output cell of its action, and add the error to each of those weights.
Does this sound right? The agents that do this dont really seem to be doing any better than the ones that would just evolve, maybe a bit worse even.
Happy_Reaper
01-13-2007, 09:17 PM
That sounds about correct to me. TD-Learning doesn't necessarily gurantee good results. Also at the outset, TD-Learning won't beat your evolution thing either. Think of how TD-Gammon got so good. It started terrible, but by playing for a long time it got very good.
It could be that you're not carrying over the data from previous experiments to subsequent ones.
Crazy Glue
01-13-2007, 10:19 PM
I think im screwing up with how the weights are adjusted. I looked at some of the agent weights in the simulator and the numbers were pretty huge. I could see how this would happen, since if the error keeps getting added to the weights, the state values increase as well and so everything just keeps increasing or decreasing like crazy. Is there a better way to apportion them so that the weights will stay within a reasonable range?
Happy_Reaper
01-14-2007, 08:04 AM
Well shouldn't you also have negative rewards ?
Perspective
01-14-2007, 11:18 AM
How are you training the neural net? Are you just plugging in default values at the begging of each simulation?
Crazy Glue
01-14-2007, 11:28 AM
The neural nets arent trained, they just begin with random values. And yes, there are negative rewards. The weights turn out to be really high or really low
Perspective
01-14-2007, 12:10 PM
The general idea of neural nets is that you train them with (a lot of) data, then you apply the trained system to the problem. You seem to be measuring your results based on the training phase. An untrained neural net performs (as you might expect) randomly.
Perspective
01-14-2007, 12:13 PM
(I can't edit my post :( ).
I don't think your simulation runs long enough to both train the net and see meaningful behaviour of it in the same run.
Perspective
01-14-2007, 12:14 PM
Also, what is the topology of your net and how did you choose it? Are using linear or non-linear neurons? What are the input and output nodes representing?
There is a lot of theory/design issues to tackle here to use neural nets productively.
Crazy Glue
01-14-2007, 02:32 PM
If i want to, I can train the agents as long as I need to. All id have to do is kill off the carnivores or keep them from sucesfuly eating, then set the energy loss for each turn to 0.
As for the neural net, the inputs are whether a plant/carnivore/herbivore is to the left/right/front/proximity of an agent, with each combination getting an input cell. The outputs are turn left, turn right, move forward, or eat. Im not sure what the diff between linear or non-linear neurons is, but I just have the inputs with weights connecting to each of the output cells, with no hidden cells.
Happy_Reaper
01-15-2007, 07:26 AM
Ok, but the data you accumulate from one experiment needs to carry on to the next one.
Crazy Glue
01-15-2007, 09:44 AM
Yeah, I can do that. My simulator can save the agents' neural nets and then reload them if needed. But that doesnt solve the prob of the really big/low neural net weights
Happy_Reaper
01-15-2007, 10:13 AM
Ok, then decrease the amount by which you change your weights, but increase the number of experiments.
Perspective
01-15-2007, 01:11 PM
>>>
The outputs are turn left, turn right, move forward, or eat. Im not sure what the diff between linear or non-linear neurons is, but I just have the inputs with weights connecting to each of the output cells, with no hidden cells.
<<<
Ok, so this is a single layer linear net. The best way to train is with a supervised learning procedure (ie. test data and the "solutions" to the test cases) ex. herbathing left, whatchamacolit right, food over there => some output. Correct answer is <some other output>. Difference in the two forms the error derivative which you propogate.
What your describing now sounds like unsupervised learning, neural nets have never been particularly well suited to that. Too many paramaters to tune which the whole system is sensative too.
Crazy Glue
01-15-2007, 05:08 PM
I dont want to use any training data whatsoever though. I want agents to be able to learn on their own whats good and bad for them without any prior knowledge of the environment. If theres a better way of doing this, please let me know.
Happy_Reaper
01-15-2007, 07:25 PM
To be honest, I see this as being a Search-Tree problem. There would be no learning involved, but your situation just screams it, almost.
I'd say right now, your learning is difficult cause all you agents are trying to learn at once which will lead them to random behviour. You need at least one that will be somewhat deterministic, or else your agents will never learn anything.
Crazy Glue
01-15-2007, 09:07 PM
NNNnnnnooo! anything but decision trees! Id rather go with trained neural nets if it comes to that. Well im gonna try to figure this out on my own i guess. I wanna stick with reinforcement learning. Maybe if i proportion the error based on the values of the input cells thatll reduce the effect. Im disapointed though. This doesnt seem to be that complex a simulator and i thought thered be some AI algorithm thatd do exactly what i wanted.
Happy_Reaper
01-15-2007, 10:48 PM
Although learning is an interesting field, from my experience Decision trees more correctly simulate most situations. That's largely due, I think, to the fact that most AI problems are very specific (like finding a path between two points, not running into obstacles, etc...) and that therefore learning often teaches them things that are not necessary to know. This leads them to usually be much less strong then straight decision tree makers.
Perspective
01-16-2007, 10:19 AM
You can also "learn" using decision tress. You train the tree to identify the most relevent branching conditions.
Crazy Glue
01-16-2007, 06:00 PM
Yeah, but im not really concerned about how well an agent does what it should so much as the way it learns. I want agents to have the least possible preprogrammed knowledge possible. If I just decided to add some new creature to the environment, I want them to be able to learn by experience how to interact with it with little, if any modifcations to the agents. I want them specifically to learn by trial and error at runtime, not with any prior training, and not by just going through a decision tree and doing the best action it finds.
Crazy Glue
01-16-2007, 09:10 PM
I just started comparing the data from the TD agents to the plain evolving agents. It actually looks like whatever i did works. When I plot the herbivores' max age vs. time, it appears that the TD agents learn to stay alive much longer. At the end of 2000 turns, the TD herbivores' max age was 1230 and the evolving herbivores' was 709
Perspective
01-16-2007, 10:17 PM
Very nice.
Now to wrap it up make your experiments scientifically sound. Run your experiments many times and average to make sure you get a representative number, then repeat for different values of "turns" to plot the behaviour. Does the learning increase? does it level off after some number of turns? Does too much learning have a negative effect (likely an error in your model) or do the networks stabilize... etc..
Happy_Reaper
01-20-2007, 07:09 AM
Those are some good results.
But as perspective said, make sure to test your model extensively. I've seen many a learning agent "appear" to be working, but only because it was in the right circumstances.
CodeMonkey
01-21-2007, 09:10 PM
What is an optimal data structor for an ANN? Linked list tree (thaT seems most logical)?
vBulletin® v3.7.0, Copyright ©2000-2008, Jelsoft Enterprises Ltd.