Baseline Models for MultiNLI Corpus

This is the code we used to establish baselines for the MultiNLI corpus introduced in A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference.

Data

The MultiNLI and SNLI corpora are both distributed in JSON lines and tab separated value files. Both can be downloaded here.

Models

We present three baseline neural network models. These range from a bare-bones model (CBOW), to an elaborate model which has achieved state-of-the-art performance on the SNLI corpus (ESIM),

We use dropout for regularization in all three models.

Training and Testing

Training settings

The models can be trained on three different settings. Each setting has its own training script.

Command line flags

To start training with any of the training scripts, there are a couple of required command-line flags and an array of optional flags. The code concerning all flags can be found in parameters.py. All the parameters set in parameters.py are printed to the log file everytime the training script is launched.

Required flags,

Optional flags,

*Dev-sets are currently used for testing on MultiNLI since the test-sets have not be released.

Other parameters

Remaining parameters like the size of hidden layers, word embeddings, and minibatch can be changed directly in parameters.py. The default hidden embedding and word embedding size is set to 300, the minibatch size (batch_size in the code) is set to 32.

Sample commands

To execute all of the following sample commands, you must be in the "python" folder,

Testing models

On dev set,

To test a trained model, simply add the test flag to the command used for training. The best checkpoint will be loaded and used to evaluate the model's performance on the MultiNLI dev-sets, SNLI test-set, and the dev-set for each genre in MultiNLI.

For example,

PYTHONPATH=$PYTHONPATH:. python train_genre.py esim petModel-2 --genre travel --emb_train --test

With the test flag, the train_mnli.py script will also generate a CSV of predictions for the unlabaled matched and mismatched test-sets.

Results for unlabeled test sets,

To get a CSV of predicted results for unlabeled test sets use predictions.py. This script requires the same flags as the training scripts. You must enter the model_type and model_name, and the path to the saved checkpoint and log files if they are different from the default (the default is set to ../logs for both paths).

Here is a sample command,

PYTHONPATH=$PYTHONPATH:. python predictions.py esim petModel-1 --alpha 0.15 --emb_train --logpath ../logs_keep --ckptpath ../logs_keep

This script will create a CSV with two columns: pairID and gold_label.

Checkpoints

We maintain two checkpoints: the most recent checkpoint and the best checkpoint. Every 500 steps, the most recent checkpoint is updated, and we test to see if the dev-set accuracy has improved by at least 0.04%. If the accuracy has gone up by at least 0.04%, then the best checkpoint is updated.

Annotation Tags

The script which was used to determine the percentage of annotation tags is available in this repository, within the subfolder "python" under the name "autotags.py". It takes a parsed corpus file (e.g., a dev set file) and reports the percentages of annotation tags in that file. You should also update your paths in the script to reflect your local file organization.

License

Copyright 2018, New York University

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.