Table of Contents generated with DocToc

kgsgo-dataset-preprocessor

Dataset preprocessor for the KGS go dataset, eg according to Clark and Storkey input planes

goal

The goal of this project is to take the data from the kgsgo website, and make it available into a somewhat generic format, that can be fed into any go-agnostic learning algorithm. The guidelines used for the creation of this project is to be able to somewhat reproduce the experiments in the Clark and Storkey paper, and also somewhat targetting the Maddison et al paper.

Pre-requisites

v1 vs v2 format

v1 format

Instructions

These are written for linux. They may need some slight tweaking for Windows

Type:

git clone --recursive https://github.com/hughperkins/kgsgo-dataset-preprocessor.git
cd kgsgo-dataset-preprocessor
python kgs_dataset_preprocessor.py

Results

Data format of resulting file

Data processing applied

MD5sum

When I run it, I get md5sums:

850d2c91b684de45f39a205378fd7967  kgsgo-test.dat
80cfa39797fa1ea32af30191b2fb962c  kgsgo-train10k.dat

If it's different, it doesn't necessarily matter, but if it's the same, it's a good sign :-)

v2 format

v2 format vs v1 format

After writing v1 format as detailed above, I noticed some things I'd prefer to do differently. Therefore, v2 format modifies these things, but without changing anything detailed above. If you continue to use kg_dataset_preprocessor.py, then the data produced will be unchanged. In addition the filenames produced by v2 do not overwrite those produced by the earlier version.

v2 changes the following:

Running v2 format processor

python kgs_dataset_preprocessor_v2.py

Available options:

md5 sums

When I run this, I get the following md5 sums. If these are different for you, it's not necessarily an issue. If they are the same, this is a good sign :-)

57382be81ef419a5f1b1cf2632a8debf  kgsgo-test-v2.dat
6172e980f348103be3ad06ae7f946b47  kgsgo-train10k-v2.dat
20440801e72452b6714d5dd061673973  kgsgo-trainall-v2.dat

File sizes:

$ ls -lh kgsgo-*v2.dat
-rw-rw-r-- 1 ubuntu ubuntu 5.8M Feb  8 05:58 kgsgo-test-v2.dat
-rw-rw-r-- 1 ubuntu ubuntu 601M Feb  8 06:13 kgsgo-train10k-v2.dat
-rw-rw-r-- 1 ubuntu ubuntu  11G Mar  7 15:58 kgsgo-trainall-v2.dat

Example loader

Third-party libraries used

Related projects

I'm building a convolutional network library in OpenCL, aiming to train this, at ClConvolve