Policy Gradient (PG) & Actor-Critic - Simple Keras Implementation

Description

This is an implementation of Policy Gradient & Actor-Critic playing Pong/Cartpole from OpenAI's gym.

Here's a quick demo of the agent trained by PG playing Pong.

With Keras, I've tried my best to implement deep reinforcement learning algorithm without using complicated tensor/session operation. In this project, the following techniques have been implemented:

Here's the architecture overview of PG model playing Pong in this work:

And the learning curve:

The project is derived from an assignment of the course Applied Deep Learning I took during 2017 fall, all works are currrently not in maintenance. (But I'll try my best to help if there's question)

Requirements

The follow packages are required, you can install them with pip3 install [package]

Setup

References