This repository contains PyTorch (v0.4.0) implementations of typical policy gradient (PG) algorithms.
We have implemented and trained the agents with the PG algorithms using the following benchmarks. Trained agents and Unity ml-agent environment source files will soon be available in our repo!
For reference, solid reviews of below papers related to PG (in Korean) are located in https://reinforcement-learning-kr.github.io/2018/06/29/0_pg-travel-guide/. Enjoy!
Table of Contents
Navigate to pg_travel/mujoco
folder
Train the agent with PPO
using Hopper-v2
without rendering.
python main.py
save_model
folder automatically for every 100th iteration.Train the agent with TRPO
using HalfCheetah
with rendering
python main.py --algorithm TRPO --env HalfCheetah-v2 --render
python main.py --load_model ckpt_736.pth.tar
ckpt_736.pth.tar
file should be in the pg_travel/mujoco/save_model
folder.algorithm
and/or env
if not PPO
and/or Hopper-v2
.Play 5
episodes with the saved model ckpt_738.pth.tar
python test_algo.py --load_model ckpt_736.pth.tar --iter 5
ckpt_736.pth.tar
file should be in the pg_travel/mujoco/save_model
folder.env
if not Hopper-v2
.Hyperparameters are listed in hparams.py
.
Change the hyperparameters according to your preference.
We have integrated TensorboardX to observe training progresses.
logs
folder.Navigate to the pg_travel/mujoco
folder
tensorboard --logdir logs
We have trained the agents with four different PG algortihms using Hopper-v2
env.
Algorithm | Score | GIF |
---|---|---|
Vanilla PG | ||
NPG | ||
TRPO | ||
PPO |
We have modified Walker
environment provided by Unity ml-agents.
Overview | image |
---|---|
Walker | |
Plane Env | |
Curved Env |
Description
Reward
Done
pg_travel/unity/env
folder.Navigate to the pg_travel/unity
folder
Train walker agent with PPO
using Plane
environment without rendering.
python main.py --train
pg_travel/mujoco/agent/ppo_gae.py
for just single-agent training.save_model
folder automatically for every 100th iteration.python main.py --load_model ckpt_736.pth.tar --train
ckpt_736.pth.tar
file should be in the pg_travel/unity/save_model
folder.python main.py --render --load_model ckpt_736.pth.tar
ckpt_736.pth.tar
file should be in the pg_travel/unity/save_model
folder.See main.py
for default hyperparameter settings.
Pass the hyperparameter arguments according to your preference.
We have integrated TensorboardX to observe training progresses.
Navigate to the pg_travel/unity
folder
tensorboard --logdir logs
We have trained the agents with PPO
using plane
and curved
envs.
Env | GIF |
---|---|
Plane | |
Curved |
We referenced the codes from below repositories.