If you have any question or want to report a bug, please open an issue instead of emailing me directly.
Modularized implementation of popular deep RL algorithms in PyTorch.
Easy switch between toy tasks and challenging games.
Implemented algorithms:
The DQN agent, as well as C51, QR-DQN and Rainbow, has an asynchronous actor for data generation and an asynchronous replay buffer for transferring data to GPU. Using 1 RTX 2080 Ti and 3 threads, the DQN agent runs for 10M steps (40M frames, 2.5M gradient updates) for Breakout within 6 hours.
Dockerfile
and requirements.txt
for more detailsexamples.py
contains examples for all the implemented algorithms.
Dockerfile
contains the environment for generating the curves below.
Please cite any of the papers here if you want to cite this repo.
cd6c30
)DDPG/TD3 evaluation performance. (5 runs, mean + standard error)
PPO online performance. (5 runs, mean + standard error, smoothed by a window of size 10)
They are located in other branches of this repo and seem to be good examples for using this codebase.
- Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation [COF-PAC, TD3-random]
- GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values [GradientDICE]
- Deep Residual Reinforcement Learning [Bi-Res-DDPG]
- Generalized Off-Policy Actor-Critic [Geoff-PAC, TD3-random]
- DAC: The Double Actor-Critic Architecture for Learning Options [DAC]
- QUOTA: The Quantile Option Architecture for Reinforcement Learning [QUOTA-discrete, QUOTA-continuous]
- ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search [ACE]