Code & data accompanying the NAACL2019 paper "Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases"
This code is written in python 3. You will need to install a few python packages in order to run the code.
We recommend you to use virtualenv
to manage your python packages and environments.
Please take the following steps to create a python virtual environment.
virtualenv
, install it with pip install virtualenv
.virtualenv venv
.source venv/bin/activate
.pip install -r requirements.txt
.Download the preprocessed data from here and put the data folder under the root directory.
Create a folder (e.g., runs/WebQ/
) to save model checkpoint. You can download the pretrained models from here. (Note: if you cannot access the above data and pretrained models, please download from here.)
Please modify the config files in the src/config/
folder to suit your needs. Note that you can start with modifying only the data folder (e.g., data_dir
, model_file
, pre_word2vec
) and vocab size (e.g., vocab_size
, num_ent_types
, num_relations
), and leave other hyperparameters as they are.
Go to the BAMnet/src
folder, train the BAMnet model
python train.py -config config/bamnet_webq.yml
Test the BAMnet model (with ground-truth topic entity)
python test.py -config config/bamnet_webq.yml
Train the topic entity predictor
python train_entnet.py -config config/entnet_webq.yml
Test the topic entity predictor
python test_entnet.py -config config/entnet_webq.yml
Test the whole system (BAMnet + topic entity predictor)
python joint_test.py -bamnet_config config/bamnet_webq.yml -entnet_config config/entnet_webq.yml -raw_data ../data/WebQ
Go to the BAMnet/src
folder, to prepare data for the BAMnet model, run the following cmd:
python build_all_data.py -data_dir ../data/WebQ -fb_dir ../data/WebQ -out_dir ../data/WebQ
To prepare data for the topic entity predictor model, run the following cmd:
python build_all_data.py -dtype ent -data_dir ../data/WebQ -fb_dir ../data/WebQ -out_dir ../data/WebQ
Note that in the message printed out, your will see some data statistics such as vocab_size
, num_ent_types
, num_relations
. These numbers will be used later when modifying the config files.
Download the pretrained Glove word ebeddings glove.840B.300d.zip.
Unzip the file and convert glove format to word2vec format using the following cmd:
python -m gensim.scripts.glove2word2vec --input glove.840B.300d.txt --output glove.840B.300d.w2v
Fetch the pretrained Glove vectors for our vocabulary.
python build_pretrained_w2v.py -emb glove.840B.300d.w2v -data_dir ../data/WebQ -out ../data/WebQ/glove_pretrained_300d_w2v.npy -emb_size 300
If you found this code useful, please consider citing the following paper:
Yu Chen, Lingfei Wu, Mohammed J. Zaki. "Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases." In Proc. 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT2019). June 2019.
@article{chen2019bidirectional,
title={Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases},
author={Chen, Yu and Wu, Lingfei and Zaki, Mohammed J},
journal={arXiv preprint arXiv:1903.02188},
year={2019}
}