Adversarial Examples for Evaluating Reading Comprehension Systems (EMNLP 2017)

This repository contains code for the paper:

Adversarial Examples for Evaluating Reading Comprehension Systems.
Robin Jia and Percy Liang
Empirical Methods in Natural Language Processing (EMNLP), 2017.

Note: To download our adversarial data, view all of our experiments, and find instructions for running adversarial evaluation on your own SQuAD model, please see our Codalab worksheet. This git repository just exposes the code that was used to generate some of the files on that Codalab worksheet.


Run to pull SQuAD data, GloVe vectors, Stanford CoreNLP, and some custom python utilities. Other python requirements are in requirements.txt.


The following sequence of commmands generates the raw AddSent training data described in Section 4.6 of our paper.

mkdir out
# Precompute nearby words in word vector space; takes roughly 1 hour
python src/py/ glove/glove.6B.100d.txt -n 100 -f data/squad/train-v1.1.json > out/nearby_n100_glove_6B_100d.json
# Run CoreNLP on the SQuAD training data; takes roughly 1 hour, uses ~18GB memory
python src/py/ corenlp -d train
# Actually generate the raw AddSent examples; takes roughly 7 minutes, uses ~15GB memory
python src/py/ dump-highConf -d train -q

The final script will generate three files with prefix train-convHighConf in the out directory, including train-convHighConf.json. train-convHighConf-mturk.tsv is in a format that can be processed by scripts in the mturk directory.

Other one-off scripts are described in their docstrings.