This thesis explores the effect of applying curriculum learning to agents trained with the deep reinforcement learning method Advantage Actor-Critic (A2C). Agents are trained in a game created for the thesis based on the battle royale genre. Agents in a battle royale game must be able to both navigate, explore, collect weapons, and eliminate opponents. The two tasks in focus are navigating and shooting. The thesis will explore how different architectures affect the performance of agents in a non-competitive environment. Furthermore, explore the effect of having one network for each task versus having one network handling both tasks. Additionally, the thesis explores the effect of having a 2D convolutional layer extracting features as the first input layer instead of a fully connected layer.
Having two separate networks each controlling a task compared to having one network controlling both tasks proved necessary for shooting, but not for navigation. In general, the networks using a 2D convolutional layer were overall better than the networks using only fully connected layers. Curriculum learning had a huge effect on the networks learning to shoot, to the point where networks trained without curriculum learning stopped shooting. Curriculum learning proved less useful for navigation in terms of collecting weapons but better when avoiding dying to the dangerzone. In environments with sparse rewards and a high amount of penalties, such as shooting, a combination of curriculum learning and task separated networks proved necessary.
This project was developed for my Master Thesis at the IT University of Copenhagen in a span of twelve weeks. The project was developed by a team of three individuals. The project was developed in C#.
(To avoid artifacts please let the video run without skipping ahead)