StarTrader:
Intelligent Trading Agent Development
with Deep Reinforcement Learning

Introduction

This project sets to create an intelligent trading agent and a trading environment that provides an ideal learning ground. A real-world trading environment is complex with stock, related instruments, macroeconomic, news and possibly alternative data in consideration. An effective agent must derive efficient representations of the environment from high-dimensional input, and generalize past experience to new situation. The project adopts a deep reinforcement learning algorithm, deep deterministic policy gradient (DDPG) to trade a portfolio of five stocks. Different reward system and hyperparameters was tried. Its performance compared to models created by recurrent neural network, modern portfolio theory, simple buy-and-hold and benchmark DJIA index. The agent and environment will then be evaluated to deliberate possible improvement and the agent potential to beat professional human trader, just like Deepmind’s Alpha series of intelligent game playing agents.

The trading agent will learn and trade in OpenAI Gym environment. Two Gym environments are created to serve the purpose, one for training (StarTrader-v0), another testing (StarTraderTest-v0). Both versions of StarTrader will utilize Gym's baseline implmentation of Deep deterministic policy gradient (DDPG).

A portfolio of five stocks (out of 27 Dow Jones Industrial Average stocks) are selected based on non-correlation factor. StarTrader will trade these five non-correlated stocks by learning to maximize total asset (portfolio value + current account balance) as its goal. During the trading process, StarTrader-v0 will also optimize the portfolio by deciding how many stock units to trade for each of the five stocks.

Based on non-correlation factor, a portfolio optimization algorithm has chosen the following five stocks to trade:

American Express
Wal Mart
UnitedHealth Group
Apple
Verizon Communications

The preprocessing function creates technical data derived from each of the stock’s OHLCV data. On average there are roughly 6-8 time series data derived for each stock.

Apart from stock data, context data is also used to aid learning:

S&P 500 index
Dow Jones Industrial Average index
NASDAQ Composite index
Russell 2000 index
SPDR S&P 500 ETF
Invesco QQQ Trust
CBOE Volatility Index
SPDR Gold Shares
Treasury Yield 30 Years
CBOE Interest Rate 10 Year T Note
iShares 1-3 Year Treasury Bond ETF
iShares Short Treasury Bond ETF

Similarly, technical data derived from the above context data’s OHLCV data are being created. All data preprocessing is handled by two modules:

data_preprocessing.py
feature_select.py

The preprocessed data are then being fed directly to StarTrader’s trading environment: class StarTradingEnv.

The feature selection module (feature_select.py) select about 6-8 features out of 41 OHLCV and its technical data, In total, there are 121 features (may varies on different machine as the algorithm is not seeded) with about 36 stock feature data and the rest are context feature data.

When trading is executed, 121 features along with total asset, current asset holdings and unrealized profit and loss will form a complete state space for the agent to trade and learn. The state space is designed to allow the agent to get a sense of the instantaneous environment in addition to how its interactions with the environment affects future state space. In another words, the trading agent bears the fruits and consequences of its own actions.

Training agent on 9 iterations

Training iterations

Testing agent on one iteration

No learning or model refinement, purely on testing the trained model. Trading agent survived the major market correction in 2018 with 1.13 Sharpe ratio.

Testing trained model with one iteration

Compare agent's performance with other trading strategies

DDPG is the best performer in terms of cumulative returns. However with a much less volatile ride, RNN-LSTM model has better risk-adjusted return: the highest Sharpe ratio (1.88) and Sortino ratio (3.06). Both RNN-LSTM and DRL-DDPG modelled trading strategies have trading costs: commission (based on Interactive Broker's fee) and slippage (modelled by Zipline and based on stock's daily volume) incorporated since there are many transactions during the trading window. The other buy-and-hold strategies' trading costs are omitted since there is stocks are only transacted once. DDPG's reward system shall be modified to yield higher risk-adjusted return. For a fair comparison, LSTM model uses the same training data and similar backtester as DDPG model.

Trading strategy performance returns comparison

Prerequisites

Python 3.6 or Anaconda with Python 3.6 environment Python packages: pandas, numpy, matplotlib, statsmodels, sklearn, tensorflow

The code is written in a Linux machine and has been tested on two operating systems: Linux Ubuntu 16.04 & Windows 10 Pro

Installation instructions:

Installation of system packages CMake, OpenMPI on Mac

brew install cmake openmpi
Activate environemnt and install gym under this environment

pip install gym

Download Official Baseline Package

Clone the repo:

git clone https://github.com/openai/baselines.git

cd baselines

pip install -e .

Install Tensorflow

There are several ways of installing Tensorflow, this page provide a good description on how it can be done with system OS, Python version and GPU availability taken into consideration.

https://www.tensorflow.org/install/

In short, after environment activation, Tensorflow can be installed with these commands:

Tensorflow for CPU:
pip3 install --upgrade tensorflow

Tensorflow for GPU:
pip3 install --upgrade tensorflow-gpu

Installing Tensorflow GPU allows faster training if your machine has nVidia GPU(s) built-in. However, Tensorflow GPU version requires the installation of the right cuDNN and CUDA, these pages provide instructions to ensure the right version is installed:

Ubuntu

[MacOS](https://www.tensorflow.org/install/install_mac (Tensorflow 1.2 no longer provides GPU support for MacOS) )

Windows
Place StarTrader and StarTraderTest folders in this repository to your machine's OpenAI Gym's environment folder:

gym/envs/
Replace the __init__.py file in the following folder with the __ini__.py provided in this repository:

gym/envs/__init__.py
Place run.py in baselines folder to the folder where you want to execute run.py, for example:

From Gym's installation:
baselines/baselines/run.py

To:
run.py
Place 'data' folder to the folder where run.py resides

/data/
Replace ddpg.py from Gym's installation with the ddpg.py in this repository:

In your machine Gym's installation:
baselines/baselines/ddpg/ddpg.py

replaced by the ddpg.py in repository:
baselines/baselines/ddpg/ddpg.py
Replace ddpg_learner.py from Gym's installation with the ddpg_learner.py in this repository:

In your machine Gym's installation:
baselines/baselines/ddpg/ddpg_learner.py

replaced by the ddpg_learner.py in repository:
baselines/baselines/ddpg/ddpg_learner.py
Place feature_select.py and data_preprocessing.py in this repository into the same folder as run.py
Place the following folders in this repository into the folder where your run.py resides

/test_result/
/train_result/
/model/

You do not need to include the folders' content, they will be generated when the program executes. If contents are included, they will be replaced once program executes.
Under the folder where run.py resides enter the following command:

To train agent:
python -m run --alg=ddpg --env=StarTrader-v0 --network=mlp --num_timesteps=2e4

To test agent:
python -m run --alg=ddpg --env=StarTraderTest-v0 --network=mlp --num_timesteps=2e3 --load_path='./model/DDPG_trained_model_8'

If you have trained a better model, replace DDPG_trained_model_8 with your new model.

After training and testing the agent successfully, pick the first DDPG trading book for the test run which is saved as ./test_result/trading_book_test_1.csv or modify filename in compare.py.
Compare agent performance with benchmark index and other trading strategies:

python compare.py

Special intructions:

Depends on machine configuration, the following intallation maybe necessary:

pip3 install -U numpy
pip3 install opencv-python
pip3 install mujoco-py==0.5.7
pip3 install lockfile
The technical analysis library, TA-Lib may be tricky to install in some machines. The following page is a handy guide: https://goldenjumper.wordpress.com/tag/ta-lib/

graphiviz which is required to plot the XGBoost tree diagram, can be installed with the following command:
Windows:
conda install python-graphviz
Mac/Linux:
conda install graphviz

StarTrader: Intelligent Trading Agent Development with Deep Reinforcement Learning