Best method for agent learning?

This is a discussion on Best method for agent learning? within the General AI Programming forums, part of the Cprogramming.com and AIHorizon.com's Artificial Intelligence Boards category; Ive made an artificial life simulator where agents begin with simple, randomly generated neural networks but develop strategies and complex ...

  1. #1
    Registered User
    Join Date
    Oct 2005
    Posts
    27

    Best method for agent learning?

    Ive made an artificial life simulator where agents begin with simple, randomly generated neural networks but develop strategies and complex behaviors through natural selection. However, they remain with their behaviors until they die and cannot change them. I was hoping to somehow modify the neural networks, use reinforcement algorithms, or do something so that they could adapt as they lived. The inputs right now are whether an herbivore/carnivore/plant is to their left/right/front/proximity, and their health status. Energy is gained by eating the type of food theyre supposed to. There is no way for herbivores to judge whether or not carnivores are dangerous right now since they will just be eaten, but i guess I could change it so that an herbivore has a certain probability of surviving. What would be the best way to implement adaptive learning for this kind of simulator?

  2. #2
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    What exactly is it that you would like to learn ?

    Because as of now there doesn't seem to be that much of a need for AI. Are you trying to create some sort of uber animal ?
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

  3. #3
    Registered User
    Join Date
    Oct 2005
    Posts
    27
    lol this is just my senior project that im working on for highschool. I just want to add some method for agents to learn how to act on their own as they live.

  4. #4
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    How strong do you want it to be ?

    And how much time are you willing to allocate to this thing ?

    For something like a school project you could do a half decent agent just by making it prefer moves away from its predators and towards its food. And this can be done very quickly.
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

  5. #5
    Registered User
    Join Date
    Oct 2005
    Posts
    27
    Nah, this needs to be actual AI learning, not just probability actions. By the way, im sending this to MIT too, and i care more about them being impressed than my school. I have until the end of january to finish this thing for MIT. Since I now have study hall every day at school, I should have enough time to add some sort of AI learning. The basic simulator itself I finished in two weeks over the summer.

  6. #6
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    In that case, real learning usually starts with some sort of utility functions for actions, based on a certain number of factors. The learning, then, is tweaking the weights as you go along.
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

  7. #7
    Registered User
    Join Date
    Oct 2005
    Posts
    27
    I dont see how that would work though for any other agent action besides eating. How would an herbivore for example know that turning when a carnivore is infront of it would be a good thing? From what ive been reading, maybe reinforcement learning might be a better idea than neural networks. What do you think?

  8. #8
    Crazy Fool Perspective's Avatar
    Join Date
    Jan 2003
    Location
    Canada
    Posts
    2,640
    >Nah, this needs to be actual AI learning, not just probability actions

    boy are you gonna be disapointed when you take your first AI class. Have a look at Bayesian Nets.. its just a bunch of probabilistic decisions where the "learning" updates the probabilities. Or Neural Nets, where training data defines the probabilities of activities of the nodes (neurons).

  9. #9
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    How would an herbivore for example know that turning when a carnivore is infront of it would be a good thing?
    You could just put an extra factor which represents the "distance to closest predator", and the greater that would be, the better.

    And even if you do neural nets, as Perspective said, you're going to be doing essentially the same thing.

    boy are you gonna be disapointed when you take your first AI class
    I'd agree with this one. I was quite dissapointed when I discovered that modern AI is so ridiculously simplistic.
    Last edited by Happy_Reaper; 01-08-2007 at 07:22 AM.
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

  10. #10
    Registered User
    Join Date
    Oct 2005
    Posts
    27
    Ok, i looked for a while, and I tried using temporal difference learning with neural networks, kinda like in TD-Gammon. Heres the algorithm I used

    1. The agent acts based on which output cell has the highest value
    2. Store the agent's inputs when it acted
    3. Set the reward equal to the difference in the agen'ts health from before it acted to its current health.
    4. Repercieve the new state of the agent.
    5. Store the new output cell with the highest value
    6. Error = reward + learningRate * (new value of highest output cell) - (value of output cell from the agent's previous action)
    7. Find the weights tied from any non-zero inputs of the agent when it acted to the output cell of its action, and add the error to each of those weights.

    Does this sound right? The agents that do this dont really seem to be doing any better than the ones that would just evolve, maybe a bit worse even.

  11. #11
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    That sounds about correct to me. TD-Learning doesn't necessarily gurantee good results. Also at the outset, TD-Learning won't beat your evolution thing either. Think of how TD-Gammon got so good. It started terrible, but by playing for a long time it got very good.

    It could be that you're not carrying over the data from previous experiments to subsequent ones.
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

  12. #12
    Registered User
    Join Date
    Oct 2005
    Posts
    27
    I think im screwing up with how the weights are adjusted. I looked at some of the agent weights in the simulator and the numbers were pretty huge. I could see how this would happen, since if the error keeps getting added to the weights, the state values increase as well and so everything just keeps increasing or decreasing like crazy. Is there a better way to apportion them so that the weights will stay within a reasonable range?

  13. #13
    Fear the Reaper...
    Join Date
    Aug 2005
    Location
    Toronto, Ontario, Canada
    Posts
    625
    Well shouldn't you also have negative rewards ?
    Teacher: "You connect with Internet Explorer, but what is your browser? You know, Yahoo, Webcrawler...?" It's great to see the educational system moving in the right direction

  14. #14
    Crazy Fool Perspective's Avatar
    Join Date
    Jan 2003
    Location
    Canada
    Posts
    2,640
    How are you training the neural net? Are you just plugging in default values at the begging of each simulation?

  15. #15
    Registered User
    Join Date
    Oct 2005
    Posts
    27
    The neural nets arent trained, they just begin with random values. And yes, there are negative rewards. The weights turn out to be really high or really low

Page 1 of 3 123 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Machine Learning with Lego Mindstorms
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 14
    Last Post: 01-30-2009, 02:34 PM
  2. Default Method
    By SolarEnergy in forum C++ Programming
    Replies: 3
    Last Post: 11-21-2008, 07:20 AM
  3. Best communication method to thousand childs?
    By Ironic in forum C Programming
    Replies: 8
    Last Post: 11-08-2008, 12:30 AM
  4. Static templated method problem
    By mikahell in forum C++ Programming
    Replies: 6
    Last Post: 11-19-2006, 09:19 AM
  5. Replies: 3
    Last Post: 12-03-2001, 01:45 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21