Table of Contents generated with DocToc


Dataset preprocessor for the KGS go dataset, eg according to Clark and Storkey input planes


The goal of this project is to take the data from the kgsgo website, and make it available into a somewhat generic format, that can be fed into any go-agnostic learning algorithm. The guidelines used for the creation of this project is to be able to somewhat reproduce the experiments in the Clark and Storkey paper, and also somewhat targetting the Maddison et al paper.


v1 vs v2 format

v1 format


These are written for linux. They may need some slight tweaking for Windows


git clone --recursive
cd kgsgo-dataset-preprocessor


Data format of resulting file

Data processing applied


When I run it, I get md5sums:

850d2c91b684de45f39a205378fd7967  kgsgo-test.dat
80cfa39797fa1ea32af30191b2fb962c  kgsgo-train10k.dat

If it's different, it doesn't necessarily matter, but if it's the same, it's a good sign :-)

v2 format

v2 format vs v1 format

After writing v1 format as detailed above, I noticed some things I'd prefer to do differently. Therefore, v2 format modifies these things, but without changing anything detailed above. If you continue to use, then the data produced will be unchanged. In addition the filenames produced by v2 do not overwrite those produced by the earlier version.

v2 changes the following:

Running v2 format processor


Available options:

md5 sums

When I run this, I get the following md5 sums. If these are different for you, it's not necessarily an issue. If they are the same, this is a good sign :-)

57382be81ef419a5f1b1cf2632a8debf  kgsgo-test-v2.dat
6172e980f348103be3ad06ae7f946b47  kgsgo-train10k-v2.dat
20440801e72452b6714d5dd061673973  kgsgo-trainall-v2.dat

File sizes:

$ ls -lh kgsgo-*v2.dat
-rw-rw-r-- 1 ubuntu ubuntu 5.8M Feb  8 05:58 kgsgo-test-v2.dat
-rw-rw-r-- 1 ubuntu ubuntu 601M Feb  8 06:13 kgsgo-train10k-v2.dat
-rw-rw-r-- 1 ubuntu ubuntu  11G Mar  7 15:58 kgsgo-trainall-v2.dat

Example loader

Third-party libraries used

Related projects

I'm building a convolutional network library in OpenCL, aiming to train this, at ClConvolve