Wednesday, September 14, 2011

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning is an area of Machine Learning that provides to the computer applications a way to mimic a human behaviour. It is based on concept of reward the Agent who has to do something. Usually any realistic task is a very hard task for an Agent, because it doesn't understand what it is doing. In fact he has to learn the task from scratch. To grasp the sense of what we are talking about, one can imagine a baby who has to learn to stand up. Usually a baby need more than 15 month before he starts this task and during this learning phase a lot of trials are needed to accomplish the task.




Actually, what the baby brain is trying to do in this phase is to training his neural network in order to understand which is the right combination of spikes to send through the  spianal column. It's easy to suppose that this task is a very hard task,because no instructions are provided to the baby but only the genetic can cause this kind of self learning. The way to learn to stand up is to know what can happen if the baby falling down. He feels pain when he is down and he feels good when he is stand up. So the key of this learning is the negative or positive reward. In same way one can imagine a horse that receive a sugar cube when it had accomplished any task in order to communicate to horse that it did something well.

For an Agent try to find the way out from a maze could be a hard task too. The only information that it knows is that it can move up,down,left or right and that it can't go through a wall. If it reaches a cell with an apple then it receive a positive reward and if it try to pass on a wall then it receive a negative reward. This mechanism is the same raw technique used from a unware man or girl who try to departure from a maze where he is never been before.
  
Qmaze is a project realized attending Intelligent Systems course that try to simulate the learning explained above. It uses the Qlearning algorithm and initially it blindly explores some cells  until it finds the first time the apple(exit or Terminal State). In the next episode again it tries to explore the maze  but if it finds cells that are a part of way out then it exploits them.

To show how it works I make some video of the agent in action:


 


The Agent in action (in this instance a blue-ball) is trying to find two different Terminal States with diffent reward. So the Agent has to learn first of all the way to the apples and then choose the apple with major reward.

2 comments: