Proximal Policy Gradient (PPO)
PPO is one of the most popular DRL algorithms. It runs reasonably fast by leveraging vector (parallel) environments, and naturally works well with different action spaces, therefore supporting a variety of games. It also has good sample efficiency compared to algorithms such as DQN.
Original paper:
Reference resources:
- Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
- What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
All our PPO implementations below are augmented with the same code-level optimizations presented in openai/baselines
's PPO. See The 32 Implementation Details of Proximal Policy Optimization (PPO) Algorithm for more details.
Our single-file implementations of PPO:
- ppo.py
- ppo_atari.py
- For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
- Works with the Atari's pixel
Box
observation space of shape(210, 160, 3)
- Works with the
Discerete
action space - Includes the 9 Atari-specific implementation details as shown in the following video tutorial
- ppo_continuous_action.py