![]() |
| | #1 |
| Registered User Join Date: Oct 2005
Posts: 27
| Best method for agent learning? |
| Crazy Glue is offline | |
| | #2 |
| Fear the Reaper... Join Date: Aug 2005 Location: Toronto, Ontario, Canada
Posts: 625
| What exactly is it that you would like to learn ? Because as of now there doesn't seem to be that much of a need for AI. Are you trying to create some sort of uber animal ?
__________________ Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction |
| Happy_Reaper is offline | |
| | #3 |
| Registered User Join Date: Oct 2005
Posts: 27
| lol this is just my senior project that im working on for highschool. I just want to add some method for agents to learn how to act on their own as they live. |
| Crazy Glue is offline | |
| | #4 |
| Fear the Reaper... Join Date: Aug 2005 Location: Toronto, Ontario, Canada
Posts: 625
| How strong do you want it to be ? And how much time are you willing to allocate to this thing ? For something like a school project you could do a half decent agent just by making it prefer moves away from its predators and towards its food. And this can be done very quickly.
__________________ Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction |
| Happy_Reaper is offline | |
| | #5 |
| Registered User Join Date: Oct 2005
Posts: 27
| Nah, this needs to be actual AI learning, not just probability actions. By the way, im sending this to MIT too, and i care more about them being impressed than my school. I have until the end of january to finish this thing for MIT. Since I now have study hall every day at school, I should have enough time to add some sort of AI learning. The basic simulator itself I finished in two weeks over the summer. |
| Crazy Glue is offline | |
| | #6 |
| Fear the Reaper... Join Date: Aug 2005 Location: Toronto, Ontario, Canada
Posts: 625
| In that case, real learning usually starts with some sort of utility functions for actions, based on a certain number of factors. The learning, then, is tweaking the weights as you go along.
__________________ Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction |
| Happy_Reaper is offline | |
| | #7 |
| Registered User Join Date: Oct 2005
Posts: 27
| I dont see how that would work though for any other agent action besides eating. How would an herbivore for example know that turning when a carnivore is infront of it would be a good thing? From what ive been reading, maybe reinforcement learning might be a better idea than neural networks. What do you think? |
| Crazy Glue is offline | |
| | #8 |
| Crazy Fool Join Date: Jan 2003 Location: Canada
Posts: 2,588
| >Nah, this needs to be actual AI learning, not just probability actions boy are you gonna be disapointed when you take your first AI class. Have a look at Bayesian Nets.. its just a bunch of probabilistic decisions where the "learning" updates the probabilities. Or Neural Nets, where training data defines the probabilities of activities of the nodes (neurons).
__________________ jeff.bagu.org - Terrain rendering and other random stuff |
| Perspective is offline | |
| | #9 | ||
| Fear the Reaper... Join Date: Aug 2005 Location: Toronto, Ontario, Canada
Posts: 625
| Quote:
And even if you do neural nets, as Perspective said, you're going to be doing essentially the same thing. Quote:
__________________ Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction Last edited by Happy_Reaper; 01-08-2007 at 07:22 AM. | ||
| Happy_Reaper is offline | |
| | #10 |
| Registered User Join Date: Oct 2005
Posts: 27
| Ok, i looked for a while, and I tried using temporal difference learning with neural networks, kinda like in TD-Gammon. Heres the algorithm I used 1. The agent acts based on which output cell has the highest value 2. Store the agent's inputs when it acted 3. Set the reward equal to the difference in the agen'ts health from before it acted to its current health. 4. Repercieve the new state of the agent. 5. Store the new output cell with the highest value 6. Error = reward + learningRate * (new value of highest output cell) - (value of output cell from the agent's previous action) 7. Find the weights tied from any non-zero inputs of the agent when it acted to the output cell of its action, and add the error to each of those weights. Does this sound right? The agents that do this dont really seem to be doing any better than the ones that would just evolve, maybe a bit worse even. |
| Crazy Glue is offline | |
| | #11 |
| Fear the Reaper... Join Date: Aug 2005 Location: Toronto, Ontario, Canada
Posts: 625
| That sounds about correct to me. TD-Learning doesn't necessarily gurantee good results. Also at the outset, TD-Learning won't beat your evolution thing either. Think of how TD-Gammon got so good. It started terrible, but by playing for a long time it got very good. It could be that you're not carrying over the data from previous experiments to subsequent ones.
__________________ Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction |
| Happy_Reaper is offline | |
| | #12 |
| Registered User Join Date: Oct 2005
Posts: 27
| I think im screwing up with how the weights are adjusted. I looked at some of the agent weights in the simulator and the numbers were pretty huge. I could see how this would happen, since if the error keeps getting added to the weights, the state values increase as well and so everything just keeps increasing or decreasing like crazy. Is there a better way to apportion them so that the weights will stay within a reasonable range? |
| Crazy Glue is offline | |
| | #13 |
| Fear the Reaper... Join Date: Aug 2005 Location: Toronto, Ontario, Canada
Posts: 625
| Well shouldn't you also have negative rewards ?
__________________ Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction |
| Happy_Reaper is offline | |
| | #14 |
| Crazy Fool Join Date: Jan 2003 Location: Canada
Posts: 2,588
| How are you training the neural net? Are you just plugging in default values at the begging of each simulation?
__________________ jeff.bagu.org - Terrain rendering and other random stuff |
| Perspective is offline | |
| | #15 |
| Registered User Join Date: Oct 2005
Posts: 27
| The neural nets arent trained, they just begin with random values. And yes, there are negative rewards. The weights turn out to be really high or really low |
| Crazy Glue is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Machine Learning with Lego Mindstorms | DavidP | General Discussions | 14 | 01-30-2009 02:34 PM |
| Default Method | SolarEnergy | C++ Programming | 3 | 11-21-2008 07:20 AM |
| Best communication method to thousand childs? | Ironic | C Programming | 8 | 11-08-2008 12:30 AM |
| Static templated method problem | mikahell | C++ Programming | 6 | 11-19-2006 09:19 AM |
| Returning an object from a method - Problem when creating my own string class | pecymanski | C++ Programming | 3 | 12-03-2001 01:45 PM |