PyTorch implementation of TRPO

This repo contains a PyTorch implementation of a Trust Region Policy Optimization agent for an environment with a discrete action space.

Environment Setup

conda create --name trpo --file requirements/conda_requirements.txt
source activate trpo
pip install -r requirements/pip_requirements.txt

python run_trpo.py --env=GYM_ENV_ID

where GYM_ENV_ID is the environment ID of the gym environment you which to train on.

trpo_pong_gif

A game of Pong as played using the policy model learned from a TRPO agent

trpo_pong_png

Plot of total reward per episode of Pong vs. episode number