Skip to main content

Reinforcement Learning (Part III) - Exploration vs Exploitation

In the Reinforcement Learning field, we face ourselves with the exploration and exploitation words. Moreover, many articles talk about the exploration vs exploitation trade-off. What do they mean? Why is this a thing in RL? Does this relationship have a big impact on the RL algorithms' outcome?
Figure 1 - Should I choose the well-known path or give a try to a new one? Photo by Jens Lelie on Unsplash

Exploration

Exploration is when the agent explores new steps and/or actions to find if other state-action pairs yield a better reward from the environment. You can explore the whole world, or a sample of it to find out the rewards you can get.  

Imagine the case where you need to lunch somewhere in your city. You have two options, in the first one you go to the same restaurant you always go with that tasty food you like. The other option is choosing a different restaurant and only after being there you find out if the food is better, equal or worst. The second option leads you to a process of exploration as you will find out the reward of a new state Only by experience it you will find if the new restaurant as good food for your taste. On the other hand, the first option is the exploitation process.

Exploitation

In this case, the agent will always choose the state-action pair with the highest reward without trying to get information about other possibilities. Getting back to the example, we will choose to go to the same restaurant as we like the food they have there (in the agent's case, the reward is greater and well known). This is usually used in a greedy approach where we look for the best immediate rewards.

As you can see, the Exploration vs Exploitation trade-off has huge importance in RL algorithms.  If you always explore the environment until you have all the state-action pairs reward's estimations, the algorithm will take too much time and will consume too many resources while running. If you only look for what you already know, you can get a restaurant with good food but you may never taste the best food for you because it is made in a restaurant that you did not explore.

Therefore, we have many algorithms that explore a sample of the environment and only then they begin to exploit. Or other ones where we exploit in the majority of the time and with a random probability we explore a new step-action pair.

Comments

Popular posts from this blog

How does COVID-19 continue to spread? - A simulation 2.0 (How it was built)

 Unfortunately, the days we are living right now are still bad, or even worse than ever. Millions of people are being killed by this "new virus", as they called it once. COVID-19 is here and will be among us for too long. Some of us thought, incorrectly, 2021 will be the year, we will have vaccines, that's it! No more problems related to COVID-19! Let's start living as before!  No, no, no! If you still think this way, please stop it right now. By not respecting the known procedures to avoid the COVID-19 infection you will keep the virus spreading chain. Consequently, the virus will kill more people, being them related to you or not. Many apparently  healthy humans are having severe "side effects" by getting infected with this virus. Stop thinking the virus provokes just flu and help to stop the spread!  Millions of healthcare professionals are giving their lives to help in this war. You are neglecting them and all the people around you! Keep yourself safe

Artificial Intelligence History

As you know, AI today is a widely used tool in every kind of systems. However, how did it start? We had only one inventor or more people had invested in AI? AI is a recent discovery? When it became so powerful and why? Today's post will put you up to date to the Artificial Intelligence History. Alan Turing Well, everything started alongside the Second World War. Sadly, some of the human's biggest discoveries occurred during wars.  In 1943,  Warren McCulloch and Walter Pitts presented an initial mathematical and computer model of the biological neuron [2].  There was 1950 when John Von Neumann and Alan Turing created the technology behind AI.  Turing created the called Bombe machine to decipher messages exchanged between the German forces. That system was the pillar of today's machine learning [1]. Turing was a huge impact in the Artificial Intelligence field, and still today some of his statements are updated and used.  Turing questioned the possible intelligence of a ma

How does COVID-19 continue to spread? - A simulation 2.0 (Results)

This post shows some of the results we can find by using the simulation. As in the first version I made some tests, now I focused the new tests on the travelling and vaccination processes. These two processes were added in the last simulation version and represent some critical behaviour and processes in the virus spread. Photo by Sharon McCutcheon on Unsplash Vaccination process impact Using the standard static configuration values we can find the following results: The vaccination process does not have a considerable impact if we close our borders. By not receiving new agents with the infection, the simulation reaches the number of 0 infected agents on the 38th day using a vaccination percentage of 0.1 If we increase the vaccination percentage to 0.9 the 0 infected agents threshold is reached on the 39th day. Thus, we can infer that if we control the flow of agents in a city/simulation, the vaccination process does not have a considerable impact as it takes some time until the people