Neural Combinatorial Optimization with RL

TensorFlow implementation of:
Neural Combinatorial Optimization with Reinforcement Learning, Bello I., Pham H., Le Q. V., Norouzi M., Bengio S.
for the TSP with Time Windows (TSP-TW).
and Learning Heuristics for the TSP by Policy Gradient, Deudon M., Cournut P., Lacoste A., Adulyasak Y. and Rousseau L.M.
for the Traveling Salesman Problem (TSP) (final release here)

model

The Neural Network consists in a RNN or self attentive encoder-decoder with an attention module connecting the decoder to the encoder (via a "pointer"). The model is trained by Policy Gradient (Reinforce, 1992).

Requirements

Architecture

(under progress)

Usage

TSP

NB: Just make sure ./save/20/model exists (create folder otherwise)

TSP-TW

NB: Just make sure save_to folders exist

Results

TSP

Sampling 128 permutations with the Self-Attentive Encoder + Pointer Decoder:

Self_Net_TSP20

TSP-TW

Sampling 256 permutations with the RNN Encoder + Pointer Decoder, followed by a 2-opt post processing on best tour:

Authors

Michel Deudon / @mdeudon

Pierre Cournut / @pcournut

References

Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2016). Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940.