Skip to main content

Reinforcement Learning (Part III) - Exploration vs Exploitation

In the Reinforcement Learning field, we face ourselves with the exploration and exploitation words. Moreover, many articles talk about the exploration vs exploitation trade-off. What do they mean? Why is this a thing in RL? Does this relationship have a big impact on the RL algorithms' outcome?
Figure 1 - Should I choose the well-known path or give a try to a new one? Photo by Jens Lelie on Unsplash

Exploration

Exploration is when the agent explores new steps and/or actions to find if other state-action pairs yield a better reward from the environment. You can explore the whole world, or a sample of it to find out the rewards you can get.  

Imagine the case where you need to lunch somewhere in your city. You have two options, in the first one you go to the same restaurant you always go with that tasty food you like. The other option is choosing a different restaurant and only after being there you find out if the food is better, equal or worst. The second option leads you to a process of exploration as you will find out the reward of a new state Only by experience it you will find if the new restaurant as good food for your taste. On the other hand, the first option is the exploitation process.

Exploitation

In this case, the agent will always choose the state-action pair with the highest reward without trying to get information about other possibilities. Getting back to the example, we will choose to go to the same restaurant as we like the food they have there (in the agent's case, the reward is greater and well known). This is usually used in a greedy approach where we look for the best immediate rewards.

As you can see, the Exploration vs Exploitation trade-off has huge importance in RL algorithms.  If you always explore the environment until you have all the state-action pairs reward's estimations, the algorithm will take too much time and will consume too many resources while running. If you only look for what you already know, you can get a restaurant with good food but you may never taste the best food for you because it is made in a restaurant that you did not explore.

Therefore, we have many algorithms that explore a sample of the environment and only then they begin to exploit. Or other ones where we exploit in the majority of the time and with a random probability we explore a new step-action pair.

Comments

Popular posts from this blog

How does COVID-19 continue to spread? - A simulation 2.0 (How it was built)

 Unfortunately, the days we are living right now are still bad, or even worse than ever. Millions of people are being killed by this "new virus", as they called it once. COVID-19 is here and will be among us for too long. Some of us thought, incorrectly, 2021 will be the year, we will have vaccines, that's it! No more problems related to COVID-19! Let's start living as before!  No, no, no! If you still think this way, please stop it right now. By not respecting the known procedures to avoid the COVID-19 infection you will keep the virus spreading chain. Consequently, the virus will kill more people, being them related to you or not. Many apparently  healthy humans are having severe "side effects" by getting infected with this virus. Stop thinking the virus provokes just flu and help to stop the spread!  Millions of healthcare professionals are giving their lives to help in this war. You are neglecting them and all the people around you! Keep yourself safe...

Artificial Intelligence History

As you know, AI today is a widely used tool in every kind of systems. However, how did it start? We had only one inventor or more people had invested in AI? AI is a recent discovery? When it became so powerful and why? Today's post will put you up to date to the Artificial Intelligence History. Alan Turing Well, everything started alongside the Second World War. Sadly, some of the human's biggest discoveries occurred during wars.  In 1943,  Warren McCulloch and Walter Pitts presented an initial mathematical and computer model of the biological neuron [2].  There was 1950 when John Von Neumann and Alan Turing created the technology behind AI.  Turing created the called Bombe machine to decipher messages exchanged between the German forces. That system was the pillar of today's machine learning [1]. Turing was a huge impact in the Artificial Intelligence field, and still today some of his statements are updated and used.  Turing questioned the possible intellig...

AI and Food Industy - The new Agriculture (Part II)

Being a farmer is not only making plants grow. Apart from all the care applied to the plants, they also need to collect what has been produced. The harvesting of agricultural products is one of the most difficult, and delicate stages in the food production chain. Harvesting strawberries - Photo by Farsai Chaikulngamdee on Unsplash Millions of people are employed in this process. It is a seasoned process what means temporary jobs. These jobs are mostly available in the summer season, where we reach the harvest phase for the majority of the plants. Many farmers dedicate themselves to fruit growth. In this agriculture's field, we have to take care of plants and then pick the fruits. As you may think, this is a hard and repetitive task. Image a farm with several acres of land. G etting all the fruits from the plants can take months and involve lots of people.  Hiring so many people is very expensive for farmers. Besides, people can harvest fruits inappropriately, reducing the fruit...