Safe Reinforcement Learning Algorithms



HCOPE (High-Confidence Off-Policy Evaluation.)

Python Implementation of HCOPE lower bound evaluation as given in the paper: Thomas, Philip S., Georgios Theocharous, and Mohammad Ghavamzadeh. "High-Confidence Off-Policy Evaluation." AAAI. 2015.

CUT Inequality

Requirements

Running Instructions

  1. Modify the environment in the main function, choosing from OpenAI gym. (Currently the code works for discrete action spaces)
  2. Run python hcope.py

Notes


Safe exploration in continuous action spaces.

Paper: Safe Exploration in Continuous Action Spaces - Dalal et al.

Running Instructions

Results

Safe Exploration

Unstability due to Safe Exploration

Explanation

Safety Signal

Safety Layer

Action Correction


Importance Sampling

Implementation of:

Comparision of different importance sampling estimators:
Different Importance sampling estimators

Image is taken from phD thesis of P.Thomas:
Links: https://people.cs.umass.edu/~pthomas/papers/Thomas2015c.pdf


Side Effects

Penalizing side effects using relative reachability

Code - https://github.com/hari-sikchi/safeRL/tree/safe_recovery/side_effects

The relative reachability measure
Equation relative reachability

Paper: Penalizing side effects using stepwise relative reachability - Krakovna et al.