Super Mario Bros Reinforcement Learning

Watch the computer learn how to play one of the most iconic video games of all time!

We use Reinforcement Learning, a subfield of Machine Learning, to teach the computer how to play the game. In RL, we reinforce behaviors we want the computer, i.e. our agent, to exhibit. Think about training a dog to perform a trick. The goal is for the furry canine to complete the entirety of the trick. If the trick is an obstacle course, the trainer would want to give the dog a treat every time it completes part of the course, relaying to the dog it is on the right track and it should go further next time for more treats. If the dog fails the obstacle in front of it, the trainer will not give the dog a treat. Eventually, the dog learns which actions earn it a treat and which ones do not, leading to the dog completing the course. The computer works in a very similar fashion.

In this scenario, the computer is given a positive reward for moving right, a negative reward for standing still, and an even more negative reward for dying. Using neural networks, it starts to learn which actions to take in order to maximize its rewards, just like our dog!

All footage below is the computer playing the game with no human intervention. An "iteration" is one playthrough of the game, regardless of whether the computer beats the level or dies. This agent is using the Double Deep Q Learning algorithm.

The code for this project was inspired by the following paper:

Zero iterations

At the very beginning, with no experience, the computer clearly doesn't know much about the game. It only knows to go right without jumping; It doesn't even know that a goomba is deadly.

1000 iterations

After 1,000 iterations, the computer knows how to jump and avoid the goombas, but unfortunately it falls into the hole.

5000 iterations

After 5,000 iterations, the computer knows how to maneuver with more confidence, but it still makes some mistakes, leading to its demise.

10000 iterations

After 10,000 iterations, we see the computer finally completes the level! It knows which objects on the screen are obstacles and what to do when approaching one.