State Representation Learning Zoo with PyTorch (part of S-RL Toolbox)

A collection of State Representation Learning (SRL) methods for Reinforcement Learning, written using PyTorch.

SRL Zoo Documentation: https://srl-zoo.readthedocs.io/

S-RL Toolbox Documentation: https://s-rl-toolbox.readthedocs.io/

S-RL Toolbox Repository: https://github.com/araffin/robotics-rl-srl

Available methods:

Autoencoder (reconstruction loss)
Denoising Autoencoder (DAE)
Forward Dynamics model
Inverse Dynamics model
Reward prediction loss
Variational Autoencoder (VAE) and beta-VAE
SRL with Robotic Priors + extensions (stereovision, additional priors)
Supervised Learning
Principal Component Analysis (PCA)
Triplet Network (for stereovision only)
Combination and stacking of methods
Random Features
[experimental] Reward Prior, Episode-prior, Perceptual Similarity loss (DARLA), Mutual Information loss

Documentation

Documentation is available online: https://srl-zoo.readthedocs.io/

Installation

Please read the documentation for more details, we provide anaconda env files and docker images.

Learning a State Representation

To learn a state representation, you need to enforce constrains on the representation using one or more losses. For example, to train an autoencoder, you need to use a reconstruction loss. Most losses are not exclusive, that means you can combine them.

All losses are defined in losses/losses.py. The available losses are:

autoencoder: reconstruction loss, using current and next observation
denoising autoencoder (dae): same as for the auto-encoder, except that the model reconstruct inputs from noisy observations containing a random zero-pixel mask
vae: (beta)-VAE loss (reconstruction + kullback leiber divergence loss)
inverse: predict the action given current and next state
forward: predict the next state given current state and taken action
reward: predict the reward (positive or not) given current and next state
priors: robotic priors losses (see "Learning State Representations with Robotic Priors")
triplet: triplet loss for multi-cam setting (see Multiple Cameras section in the doc)

[Experimental]

reward-prior: Maximises the correlation between states and rewards (does not make sense for sparse reward)
episode-prior: Learn an episode-agnostic state space, thanks to a discriminator distinguishing states from same/different episodes
perceptual similarity loss (for VAE): Instead of the reconstruction loss in the beta-VAE loss, it uses the distance between the reconstructed input and real input in the embedding of a pre-trained DAE.
- mutual information loss: Maximises the mutual information between states and rewards

All possible arguments can be display using python train.py --help. You can limit the training set size (--training-set-size argument), change the minibatch size (-bs), number of epochs (--epochs), ...

Datasets: Simulated Environments and Real Robots

Although the data can be generated easily using the RL repo in simulation (cf Generating Data), we provide datasets with a real baxter:

Dataset 1
Dataset 2 with multiple cameras

Examples

You can download an example dataset here.

Train an inverse model:

python train.py --data-folder data/path/to/dataset --losses inverse

Train an autoencoder:

python train.py --data-folder data/path/to/dataset --losses autoencoder

Combining an autoencoder with an inverse model is as easy as:

python train.py --data-folder data/path/to/dataset --losses autoencoder inverse

You can as well specify the weight of each loss:

python train.py --data-folder data/path/to/dataset --losses autoencoder:1 inverse:10

Please read the documentation for more examples.

Running Tests

Download the test datasets kuka_gym_test and kuka_gym_dual_test and put it in data/ folder.

./run_tests.sh

Troubleshooting

CUDA out of memory error

python train.py --data-folder data/staticButtonSimplest

RuntimeError: cuda runtime error (2) : out of memory at /b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu:66

SOLUTION 1: Decrease the batch size, e.g. 32-64 in GPUs with little memory.

SOLUTION 2 Use simple 2-layers neural network model python train.py --data-folder data/staticButtonSimplest --model-type mlp