Reinforcement Learning (Part I)

Reinforcement Learning (Part I) - How does it work?

Today's post is about a Machine Learning area, the Reinforcement Learning (RL). This article seeks to summarise the principal types of algorithms used for reinforcement learning. Here we will get an overview of the existing RL methods on an intuitive level. In further posts, we will go into more detail and code examples.

Robot - Photo by Photos Hobby on Unsplash

As other Artificial Intelligence's approaches, RL is not a new thing. The first studies and developments dating back to the 1850s and further advances on mid-1950s, where Richard Bellman has a huge impact [1]. Now, we are achieving several advances in the area and improving the results year after year. Reinforcement learning is nowadays the most virtuous way to suggest or find the machine’s creativity. Please note, different from human beings, theses algorithms can fetch experience from millions of parallel simulations if they are running on a powerful infrastructure.

Terminologies

Figure 1 - Agent-environment interaction

Agent — The learner and the one that makes actions. The agent's goal is to maximise the cumulative reward across a set of actions and states.

Action — A set of actions which the agent can perform. Different environments allow the agent to perform distinct kinds of actions. The set of all valid actions in a given environment is usually denominated action space.

State — The state of the agent in the environment. A state is a total description of the environment's state.

Reward — For each action/state picked by the agent the environment gives a reward. Usually a scalar value.

Environment — Where the agent learns and chooses what actions to perform. The environment is the world where the agent lives and interacts.

Policy — A policy is a rule in which the agent bases itself to decide what actions to take. It can be deterministic or stochastic.

Figure 2 - Artificial Intelligence areas. Reinforcement Learning is a sub-area of Machine Learning.

Reinforcement learning (RL) is a sub-area of Machine Learning that can be seen as the study of decision making across time, taking into account the consequences of the chosen actions and/or states. These algorithms and systems are capable of learning from their successes (and failures) by rewarding the agent. RL investigates the agent's environment in a way the agent can learn, and infer which actions/states he must choose to get the best reward.

Figure 3 - Supervised Learning vs Reinforcement Learning Architecture. RL does not use a human trusted ground truth to calculate the cost function.

In a Supervised Learning algorithm, the training data contains the correct label/prediction, so the model is trained with the correct label. However, in Reinforcement Learning, there is no labelled data. The agent decides what he should do to perform the given task. Hence, due to the absence of a training dataset with labelled data, the agent is forced to learn from its experience.

In the other hand, while the goal in Unsupervised Learning approaches is to find similarities and differences between data points, in Reinforcement Learning, the goal is to find the best set of actions/states that maximizes the total cumulative reward of the agent.

Where does the idea of RL come from? The scientists based their thoughts on what scenarios?

Well, if we look to the way a baby or even an adult interacts with the environment to learn or accomplish his goal, we have the fundamentals of RL. Imagine a baby moving his arms and legs. First, he only moves them randomly or in a stochastic way. But, as time is passing by, the baby is starting to understand what moves/actions are correlated to some rewards or goals.

Please note, in this process, we do not have a contact to the environment where the baby can know beforehand what kind of moves/actions he must do. Therefore, this trial and error of moves/actions produce enough information about cause-effect and consequences-actions relationships. Then, the baby sums it all together and uses the information to reach his goal.

The Reinforcement Learning types of algorithms

Figure 4 - Types of RL algorithms

There are three types of RL algorithms:

Model-based
Value-base
Policy-based

In the model-based approach, the agent learns the environment's model and then plans his actions based on it. Therefore, to have an up to date model, the agent updates it periodically. The value-based RL algorithms learn the state or the state-action value and chose the best one. And finally, the policy-based algorithms, where the agent learns the stochastic policy function that maps one state to one action. In the next posts, we will understand each one in more detail.

Figure 5 - RL taxonomy (simplified version)

What are the problems in which RL algorithms are the best solution? Is this Artificial Intelligence sub-area the most promising one? Please let me know in the comments.

References:

Sutton, R., & Barto, A. (n.d.). Reinforcement Learning: An Introduction. Retrieved from https://inst.eecs.berkeley.edu/~cs188/sp20/assets/files/SuttonBartoIPRLBook2ndEd.pdf

Smart Insight

Search This Blog