PRPN Analysis

This repo contains the output files and analysis results reported in the paper "Grammar Induction with Neural Language Models: An Unusual Replication" [1], where we perform an in-depth analysis of the Parsing Reading Predict Networks [2].

The parsed files can be downloaded here. The parsed files are named in the following way:

parsed{parsed-dataset}{model-type}{train-data}{earlystop-criterion}.jsonl
Example: parsed_WSJ_PRPNUP_WSJFull_ESUP.jsonl

We also share the pretrained model that provides the best F-1 score (PRPN-LM trained on AllNLI with language modeling criterion) which can be downloaded here.

You will need the original PTB corpus to use NLTK for reading the WSJ trees in data_ptb.py, which is used in PRPN_UP (main_UP.py) and parse_data.py. The original PTB corpus can be downloaded here. The vocabulary files for all models as well as the preprocessed PTB data files used in PRPN_LM (main_LM.py) can be downloaded here.

To produce parses using pretrained model: python parse_data.py --data path_to_data --checkpoint path_to_model/model_lm.pt --seed 1111 --eval_data path_to_multinli/multinli_1.0_dev_matched.jsonl --save_eval_path save_path/parsed_MNLI.jsonl

References

[1] Phu Mon Htut, Kyunghyun Cho, Samuel R. Bowman. Grammar Induction with Neural Language Models: An Unusual Replication. To appear in Proceedings of the EMNLP. 2018.

[2] Yikang Shen, Zhouhan Lin, Chin wei Huang, and Aaron Courville. Neural language modeling by jointly learning syntax and lexicon. Proceedings International Conference on Learning Representations. 2018. [code]