Reinforcement Learning (RL) is a very interesting form of Machine Learning that deserves to be explained.
It is also a less commonly used form, on the one hand because it has only recently been developed and is not always applicable. But more on this later.
With supervised learning, people indicate in advance what they are looking for. For example by annotating many photos with dogs and cats. So a photo with a dog will be labeled dog and a photo with a cat will be labeled cat. If there is enough reference material, you can use machine learning models to make a computer recognize photos with dogs and cats.
Reinforcement Learning is intended to determine in a dynamic environment what the best next action would be. This is especially useful for robots, autonomous vehicles and games. But also to make simulations and determine the best strategy from them. This can be applied for policy making or scientific research.
Basically, Reinforcement Learning works by rewarding and punishing. The image below shows how that works. The agent takes an action that affects its environment. The environment responds to this and returns the Reward value. A negative reward value is then actually penalty points. Depending on the reward value, the neural network changes by influencing the parameters and thus improving the result a little bit. But only for that specific situation, because it can be negative for another situation. By playing millions of situations in this way, the network is getting better and better and so are the actions that the system chooses. That way you can train a computer to become world champion of chess or Go in a few days (by Alhpago)
Cons:
- Reinforcement learning is not preferred for solving simple problems.
- The curse of “real world” examples. For example, robots are often very expensive, and learning such hardware through trial and error is therefore also expensive, because the robot will break down regularly.
- Multi Tasking. Learning must take place per task, because otherwise the status of what has been learned cannot be properly recorded. That takes a lot of time.
It may be good way to explain how Reinforcement works using the game of pong as example. This is a game with two players and a ball that you have to get into the goal of the other person (see image). In electronic form, the goalkeeper can move up or down. You can teach a computer the game by applying supervised learning, so that it learns what the keeper should do based on the images. But this would never make the computer better than the player who provided the examples. This is because not all situations are included and there is no progress because he would have to look for better strategies himself.
So that’s exactly the difference. That is possible with reinforced learning. The neural network on the right side is optimized according to the position of the ball. Either based on the opponent’s action or both. Then the model adjusts the (initial) values of the input in order to obtain a better outcome. In fact, the output is whether the keeper should have moved up or down to score or not to prevent a goal.
The algorithms behind reinforcement learning come from the Russian mathematician Andrey Andreevich Markov, who worked out the basis for the Markov Decision process named after him. We are not going to explain exactly how that works here. What’s important to note is that his work in the 1950s laid the foundation for today’s computer world champions.
As impressive as RF is. It is not always applicable, usually not even. But it certainly has a number of areas that could take a significant step forward with that.
Some applications of Reinforcement Learning:
- Robotics, with RF models, a robot can move better in a dynamic environment, which was virtually impossible until now. We always had to tell robots exactly what to do and when.
- Gaming. So it has become famous for this. The first chess computer to beat a human did this without using artificial intelligence, but using brute computer power. By actually calculating all possible moves in advance and the possible profit with that move. With RF it is possible to let computers win more complex games with many more options.
- Simulations. This could be policy, for example. If you have a mathematical model and you want to calculate the optimum by varying all kinds of parameters.
- News and personalization of content. Clicking and coming back is seen as a success. Failure is when none of the content shown is worthwhile.
- Self-driving cars. Although these are technically robots, I would like to mention him separately because of the major impact this will have on society as we know him.
Want to know more? Click on the links to learn more about artificial intelligence , deep learning, machine learning, computer vision or the business applications