Policy Gradient (PG) Algorithms

image

This repository contains PyTorch (v0.4.0) implementations of typical policy gradient (PG) algorithms.

We have implemented and trained the agents with the PG algorithms using the following benchmarks. Trained agents and Unity ml-agent environment source files will soon be available in our repo!

For reference, solid reviews of below papers related to PG (in Korean) are located in https://reinforcement-learning-kr.github.io/2018/06/29/0_pg-travel-guide/. Enjoy!

Table of Contents

Mujoco-py

1. Installation

2. Train

Navigate to pg_travel/mujoco folder

Basic Usage

Train the agent with PPO using Hopper-v2 without rendering.

python main.py

Train the agent with TRPO using HalfCheetah with rendering

python main.py --algorithm TRPO --env HalfCheetah-v2 --render

Continue training from the saved checkpoint

python main.py --load_model ckpt_736.pth.tar

Test the pretrained model

Play 5 episodes with the saved model ckpt_738.pth.tar

python test_algo.py --load_model ckpt_736.pth.tar --iter 5

Modify the hyperparameters

Hyperparameters are listed in hparams.py. Change the hyperparameters according to your preference.

3. Tensorboard

We have integrated TensorboardX to observe training progresses.

Navigate to the pg_travel/mujoco folder

tensorboard --logdir logs

4. Trained Agent

We have trained the agents with four different PG algortihms using Hopper-v2 env.

Algorithm Score GIF
Vanilla PG trpo
NPG trpo
TRPO trpo
PPO ppo

Unity ml-agents

1. Installation

2. Environments

We have modified Walker environment provided by Unity ml-agents.

Overview image
Walker walker
Plane Env plane
Curved Env curved

Description

Prebuilt Unity envrionements

3. Train

Navigate to the pg_travel/unity folder

Basic Usage

Train walker agent with PPO using Plane environment without rendering.

python main.py --train

Continue training from the saved checkpoint

python main.py --load_model ckpt_736.pth.tar --train

Test the pretrained model

python main.py --render --load_model ckpt_736.pth.tar

Modify the hyperparameters

See main.py for default hyperparameter settings. Pass the hyperparameter arguments according to your preference.

4. Tensorboard

We have integrated TensorboardX to observe training progresses.

Navigate to the pg_travel/unity folder

tensorboard --logdir logs

5. Trained Agent

We have trained the agents with PPO using plane and curved envs.

Env GIF
Plane plane
Curved curved

Reference

We referenced the codes from below repositories.