Our paper is at https://arxiv.org/pdf/2002.03230.pdf
First install the dependencies via conda:
And then run pip install .
The molecule generation code is in the generation/
folder.
data/qed/train_pairs.txt
.data/qed/test.txt
.Extract substructure vocabulary from a given set of molecules:
python get_vocab.py < data/qed/mols.txt > vocab.txt
Please replace data/qed/mols.txt
with your molecules data file.
Preprocess training data:
python preprocess.py --train data/qed/train_pairs.txt --vocab data/qed/vocab.txt --ncpu 16 < data/qed/train_pairs.txt
mkdir train_processed
mv tensor* train_processed/
Please replace --train
and --vocab
with training and vocab file.
Train the model:
mkdir models/
python gnn_train.py --train train_processed/ --vocab data/qed/vocab.txt --save_dir models/
Make prediction on your lead compounds (you can use any model checkpoint, here we use model.5 for illustration)
python decode.py --test data/qed/valid.txt --vocab data/qed/vocab.txt --model models/model.5 --num_decode 20 > results.csv