Skip to main content

Reinforcement Learning (Part I) - How does it work?

Today's post is about a Machine Learning area,  the Reinforcement Learning (RL). This article seeks to summarise the principal types of algorithms used for reinforcement learning. Here we will get an overview of the existing RL methods on an intuitive level. In further posts, we will go into more detail and code examples.
Robot - Photo by Photos Hobby on Unsplash

As other Artificial Intelligence's approaches, RL is not a new thing. The first studies and developments dating back to the 1850s and further advances on mid-1950s, where Richard Bellman has a huge impact [1]. Now, we are achieving several advances in the area and improving the results year after year. Reinforcement learning is nowadays the most virtuous way to suggest or find the machine’s creativity. Please note, different from human beings, theses algorithms can fetch experience from millions of parallel simulations if they are running on a powerful infrastructure.

Terminologies

Figure 1 - Agent-environment interaction
Agent — The learner and the one that makes actions. The agent's goal is to maximise the cumulative reward across a set of actions and states.
Action — A set of actions which the agent can perform. Different environments allow the agent to perform distinct kinds of actions. The set of all valid actions in a given environment is usually denominated action space.
State — The state of the agent in the environment. A state is a total description of the environment's state. 
Reward — For each action/state picked by the agent the environment gives a reward. Usually a scalar value. 
Environment — Where the agent learns and chooses what actions to perform. The environment is the world where the agent lives and interacts. 
Policy —  A policy is a rule in which the agent bases itself to decide what actions to take. It can be deterministic or stochastic.

Figure 2 - Artificial Intelligence areas. Reinforcement Learning is a sub-area of Machine Learning.
Reinforcement learning (RL) is a sub-area of Machine Learning that can be seen as the study of decision making across time, taking into account the consequences of the chosen actions and/or states.  These algorithms and systems are capable of learning from their successes (and failures) by rewarding the agent. RL investigates the agent's environment in a way the agent can learn, and infer which actions/states he must choose to get the best reward.  
Figure 3 - Supervised Learning vs Reinforcement Learning Architecture. RL does not use a human trusted ground truth to calculate the cost function.
In a Supervised Learning algorithm, the training data contains the correct label/prediction, so the model is trained with the correct label. However, in Reinforcement Learning, there is no labelled data. The agent decides what he should do to perform the given task. Hence, due to the absence of a training dataset with labelled data, the agent is forced to learn from its experience.
In the other hand, while the goal in Unsupervised Learning approaches is to find similarities and differences between data points, in Reinforcement Learning, the goal is to find the best set of actions/states that maximizes the total cumulative reward of the agent. 

Where does the idea of RL come from? The scientists based their thoughts on what scenarios? 
Well, if we look to the way a baby or even an adult interacts with the environment to learn or accomplish his goal, we have the fundamentals of RL.  Imagine a baby moving his arms and legs. First, he only moves them randomly or in a stochastic way.  But, as time is passing by, the baby is starting to understand what moves/actions are correlated to some rewards or goals.
Please note, in this process, we do not have a contact to the environment where the baby can know beforehand what kind of moves/actions he must do. Therefore, this trial and error of moves/actions produce enough information about cause-effect and consequences-actions relationships. Then, the baby sums it all together and uses the information to reach his goal.

The Reinforcement Learning types of algorithms

Figure 4 - Types of RL algorithms
There are three types of RL algorithms:
  • Model-based
  • Value-base
  • Policy-based
In the model-based approach, the agent learns the environment's model and then plans his actions based on it. Therefore, to have an up to date model, the agent updates it periodically. The value-based RL algorithms learn the state or the state-action value and chose the best one. And finally, the policy-based algorithms, where the agent learns the stochastic policy function that maps one state to one action. In the next posts, we will understand each one in more detail. 

Figure 5 - RL taxonomy (simplified version)

What are the problems in which RL algorithms are the best solution? Is this Artificial Intelligence sub-area the most promising one? Please let me know in the comments.  


References:

  1. Sutton, R., & Barto, A. (n.d.). Reinforcement Learning: An Introduction. Retrieved from https://inst.eecs.berkeley.edu/~cs188/sp20/assets/files/SuttonBartoIPRLBook2ndEd.pdf

Comments

Popular posts from this blog

How does COVID-19 continue to spread? - A simulation 2.0 (How it was built)

 Unfortunately, the days we are living right now are still bad, or even worse than ever. Millions of people are being killed by this "new virus", as they called it once. COVID-19 is here and will be among us for too long. Some of us thought, incorrectly, 2021 will be the year, we will have vaccines, that's it! No more problems related to COVID-19! Let's start living as before!  No, no, no! If you still think this way, please stop it right now. By not respecting the known procedures to avoid the COVID-19 infection you will keep the virus spreading chain. Consequently, the virus will kill more people, being them related to you or not. Many apparently  healthy humans are having severe "side effects" by getting infected with this virus. Stop thinking the virus provokes just flu and help to stop the spread!  Millions of healthcare professionals are giving their lives to help in this war. You are neglecting them and all the people around you! Keep yourself safe

Artificial Intelligence History

As you know, AI today is a widely used tool in every kind of systems. However, how did it start? We had only one inventor or more people had invested in AI? AI is a recent discovery? When it became so powerful and why? Today's post will put you up to date to the Artificial Intelligence History. Alan Turing Well, everything started alongside the Second World War. Sadly, some of the human's biggest discoveries occurred during wars.  In 1943,  Warren McCulloch and Walter Pitts presented an initial mathematical and computer model of the biological neuron [2].  There was 1950 when John Von Neumann and Alan Turing created the technology behind AI.  Turing created the called Bombe machine to decipher messages exchanged between the German forces. That system was the pillar of today's machine learning [1]. Turing was a huge impact in the Artificial Intelligence field, and still today some of his statements are updated and used.  Turing questioned the possible intelligence of a ma

How does COVID-19 continue to spread? - A simulation 2.0 (Results)

This post shows some of the results we can find by using the simulation. As in the first version I made some tests, now I focused the new tests on the travelling and vaccination processes. These two processes were added in the last simulation version and represent some critical behaviour and processes in the virus spread. Photo by Sharon McCutcheon on Unsplash Vaccination process impact Using the standard static configuration values we can find the following results: The vaccination process does not have a considerable impact if we close our borders. By not receiving new agents with the infection, the simulation reaches the number of 0 infected agents on the 38th day using a vaccination percentage of 0.1 If we increase the vaccination percentage to 0.9 the 0 infected agents threshold is reached on the 39th day. Thus, we can infer that if we control the flow of agents in a city/simulation, the vaccination process does not have a considerable impact as it takes some time until the people