Deep Q-Learning Agent for Traffic Signal Control

A framework where a deep Q-Learning Reinforcement Learning agent tries to choose the correct traffic light phase at an intersection to maximize traffic efficiency.

I have uploaded this here to help anyone that is searching for a good starting point for deep reinforcement learning with SUMO. This code is extracted from my master thesis and it represents a simplified version of the code used for my thesis work. I hope you can find this repository useful for your project.

Improved version - 12 Jan 2020

Changelog:

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. In my opinion, these are the easiest steps to follow to be able to run the algorithm starting from scratch. A computer with an NVIDIA GPU is strongly recommended.

  1. Download Anaconda (official site) and install.
  2. Download SUMO (official site) and install.
  3. Follow this short guide to install tensorflow-gpu correctly and problem-free. Basically, the guide tells you to open Anaconda Prompt, or any terminal, and type the following commands:
    conda create --name tf_gpu
    activate tf_gpu
    conda install tensorflow-gpu

I've used the following software versions: Python 3.7, SUMO traffic simulator 1.2.0, tensorflow 2.0

Running the algorithm

You don't need to open any SUMO software since everything is loaded and done in the background. If you want to see the training process as it goes, you need to set to True the parameter gui contained in the file training_settings.ini. Keep in mind that viewing the simulation is very slow compared to the background training and you also need to close SUMO-GUI every time an episode ends, which is not practical.

The file training_settings.ini contains all the different parameters used by the agent in the simulation. The default parameters are not that optimized, so a bit of testing will likely increase the current performance of the agent.

When the training ends, the results will be stored in "./model/model_x/" where x is an increasing integer starting from 1, generated automatically. Results will include some graphs, the data used to generate the graphs, the trained neural network, and a copy of the ini file where the agent settings are.

Now you can finally test the trained agent. To do so, you will run the file testing_main.py. The test involves a single episode of simulation, and the results of the test will be stored in "./model/model_x/test/" where x is the number of the model that you specified to test. The number of the model to test and other useful parameters are contained in the file testing_settings.ini.

Training time: ~27 seconds per episode, 45min for 100 episodes, on a computer equipped with i7-3770K, 8GB RAM, NVIDIA GTX 970, SSD.

The code structure

The main file is training_main.py. It handles the main loop that starts an episode on every iteration. It also saves the network and 3 graphs: negative reward, cumulative wait time, and average queues.

Overall the algorithm is divided into classes that handle different parts of the training.

In the "intersection" folder there is a file called environment.net.xml which defines the structure of the environment, and it was created using SUMO NetEdit. The other file sumo_config.sumocfg it is a linker between the environment file and the route file.

The settings explained

The settings used during the training and contained in the file training_settings.ini are the following:

The settings used during the testing and contained in the file testing_settings.ini are the following (some of them have to be the same of the ones used in the relative training):

The Deep Q-Learning Agent

Framework: Q-Learning with deep neural network.

Context: traffic signal control of 1 intersection.

Environment: a 4-way intersection with 4 incoming lanes and 4 outgoing lanes per arm. Each arm is 750 meters long. Each incoming lane defines the possible directions that a car can follow: left-most lane dedicated to left-turn only; right-most lane dedicated to right-turn and straight; two middle lanes dedicated to only going straight. The layout of the traffic light system is as follows: the left-most lane has a dedicated traffic-light, while the other three lanes share the same traffic light.

Traffic generation: For every episode, 1000 cars are created. The car arrival timing is defined according to a Weibull distribution with shape 2 (a fast increase of arrival until peak just before the mid-episode, then slow decreasing). 75% of vehicles spawned will go straight, 25% will turn left or right. Every vehicle has the same probability to be spawned at the beginning of every arm. On every episode the cars are generated randomly so is not possible to have two equivalent episodes in term of vehicle's arrival layout.

Agent ( Traffic Signal Control System - TLCS):

Author

If you need further information or you have some suggestions, I suggest you open an issue on the issues page, or look at my master thesis here or ultimately write me an e-mail at info@andreavidali.com.

License

This project is licensed under the MIT License - see the LICENSE.md file for details