Reinforcement Learning - introduction

Reinforcement learning agent solving the inverted pendulum problem. The code used to produce this gif can be found here.

Reinforcement learning agent solving the inverted pendulum problem. The code used to produce this gif can be found here.

Reinforcement learning is an area of machine learning that deals with how an agent ought to take actions in an environment in order to maximize a reward. Unlike other areas of machine learning, such as supervised learning (think voice recognition or image recognition), reinforcement learning is well suited for the control and automation of physical processes.

As a control engineer, reinforcement learning could very well render all of my classical and modern control theory knowledge obsolete.

To the left you can see a classic control problem called the inverted pendulum. This problem is akin to trying to balance a tennis racquet or similar object in the palm of your hand. I could use my knowledge of control theory to design a solution that would be able to keep the pendulum upright. If I implemented a PID controller I could start plugging in gains and do a lot of manual tuning or I could follow something more structured such as the Ziegler-Nichols method. If I wanted to go really in depth I could start modelling and experimenting to find values for system parameters such as the mass of the cart or the pole and use these to aid my controller design. 

Regardless of the paths I could take, none of them would have me creating a workable solution in a short period of time, despite all of my knowledge and experience. 

Reinforcement learning on the other hand, doesn't require any prior knowledge (of the system or control theory). All of the work in reinforcement learning is focused on creating agents who can learn from experience. Essentially the agent will play around the in environment, and learn what works and what doesn't. The main human involvement is in setting the rewards. This is how you communicate to agent what you want to happen. For the inverted pendulum examples seen on the left, the agent received a large negative reward every time the pendulum fell past a certain angle. For every time step this didn't happen, it received a small positive reward (the longer the pendulum stays upright, the more of these small positive rewards it collects).

To below you can the reinforcement learning agent at various stages of learning. Once the agent plays around in its environment long enough, it will eventually be able to keep the pendulum upright indefinitely.

eps3.gif