My inspiration for this project was born out of my two biggest interests in the coding world which is video game development and AI. I was also majorly inspired by a YouTube video "Training AI to Play Pokemon with Reinforcement Learning" where an AI was trained and able to complete a part of Pokemon Red and wanted to do something similar by myself but on a smaller scale. This is where the idea for the bird game was born since it only has one output that is to jump or not jump which brings the scale of the project down since I set a timeline for myself for 15 days. I chose to use genetic evolution and reinforcement learning algorithm for this entire project and decided to use our schools game engine TGE so I could have full control of the project and get to use C++.
For this project I decided that making a video showcase showing the entire learning process for the simulated players. The video also provides an interesting factor that is the visual depiction of the neural network connections being affected in real time. A minor breakdown of how the project works is also provided at the end of the video, however a full breakdown is here on the website.
When taking on this task I first had to pick what game engine or solution I was going to go with as my canvas and I chose TGE which is my schools inhouse game engine using C++. The reasoning behind this is that I feel the most comfortable using C++ and it has easy sprite rendering capabilities which gives me more time coding AI. Also using a game that I coded my self gives me a huge advantage when it comes to rewarding my well performing simulated players since I can integrate awarding players fitness score directly in the code.
For the method of neural network I am using its entirely based up on weights and computing these weights to make decisions. Its built on 3 different layers which are Input, Hidden and Output layer.
For the input layer here we insert different variables that can help the player to make assessments for the next layer. The inputs I decided to use was the players velocity, the distance between the player and then next obstacle and finally the height between the obstacles. With all these input values we send it further to the hidden layer.
The hidden layer's assignment is to process all the input and create its own values that it can alter in order to learn different behaviours. The hidden layer is where the magic happens and we use our algorithms that I will go into further later.
And then finally there is the output layer. The output layer is where the decisions get made for what outputs are going to happen. In my case there is only one output that is very binary which is to jump or not to jump. My integration of this takes the information of the hidden layer and then use a sigmoid function to get a final float value between 0 and 1. Then we check if the output value is over a certain threshold and therefore jump or not.
Now for the weights, these are float values between the layers and are essentially connections. Where it is the most important is on the hidden layer since it has a connection to the input and output and mediate for them.
An integral part of all neural network is how it learns and evolve. For my project I am using reinforcement learning and genetic evolution.
Reinforcement learning means that the AI learns from its mistakes and adapts according to a fitness score. Now this means that you reward or punish the AI user depending on its performance. For example I punished my players if they lost by going outside of the screen boundaries since this discourage them from staying dormant. For the rewarding I gave my players higher fitness score for every frame that they stay alive. This encourages the players to want to keep playing and therefore learn to play better.
Genetic evolution is the process of picking out the highest performing players and then combining and mutate them to find the most optimal set of weights. In my case I took the two players with the highest fitness score and then combined their weights into a new set of weights and then give them a small chance for mutations. The mutations are new randomized values set on the weights to find possible improved ones. After combining and mutating the weights we give all the simulated players except the parents the new set of weights improving them.
By using both of these methods we can effectively create a system that can learn and improve a shared brain or network that the simulated players can use.
The results of this projects are nothing less than amazing in my opinion. From multiple times of me running the AI from scratch it learns very fast and becomes experienced enough to complete the game in just minutes. It has also taught me immensely about algorithms and how AI networks function which is precisely why I chose this type of task.
As for possible improvements there are definitely some that come to mind if you would like to take it to the next level.
Larger data model, this means that you would simulate more players or speed up the time it takes to do a single run.
Add more nodes, if you would add more input nodes then the AI could figure out more precisely how to navigate the obstacles. And for the hidden nodes if you would add more of them then this would give the AI the opportunity for more room to make complex decisions.
Functionality for saving the weights to a serialized data object like JSON document for editing and observing outside of runtime.