XLNet is a generalized autoregressive pretraining method proposed by CMU & Google Brain, which outperforms BERT on 20 NLP tasks ranging from question answering, natural language inference, sentiment analysis, and document ranking. XLNet is inspired by the pros and cons of auto-regressive and auto-encoding methods to overcome limitation of both sides, which uses a permutation language modeling objective to learn bidirectional context and integrates ideas from Transformer-XL into model architecture. This project is aiming to provide extensions built on top of current XLNet and bring power of XLNet to other NLP tasks like NER and NLU.
Figure 1: Illustrations of fine-tuning XLNet on different tasks
python prepro/prepro_conll.py \
--data_format json \
--input_file data/ner/conll2003/raw/eng.xxx \
--output_file data/ner/conll2003/xxx-conll2003/xxx-conll2003.json
CUDA_VISIBLE_DEVICES=0 python run_ner.py \
--spiece_model_file=model/cased_L-24_H-1024_A-16/spiece.model \
--model_config_path=model/cased_L-24_H-1024_A-16/xlnet_config.json \
--init_checkpoint=model/cased_L-24_H-1024_A-16/xlnet_model.ckpt \
--task_name=conll2003 \
--random_seed=100 \
--predict_tag=xxxxx \
--data_dir=data/ner/conll2003 \
--output_dir=output/ner/conll2003/data \
--model_dir=output/ner/conll2003/checkpoint \
--export_dir=output/ner/conll2003/export \
--max_seq_length=128 \
--train_batch_size=32 \
--num_hosts=1 \
--num_core_per_host=1 \
--learning_rate=2e-5 \
--train_steps=2500 \
--warmup_steps=100 \
--save_steps=500 \
--do_train=true \
--do_eval=true \
--do_predict=true \
--do_export=true
tensorboard --logdir=output/ner/conll2003
docker run -p 8500:8500 \
-v output/ner/conll2003/export/xxxxx:models/ner \
-e MODEL_NAME=ner \
-t tensorflow/serving
Figure 2: Illustrations of fine-tuning XLNet on CoNLL2003-NER task
CoNLL2003 - NER | Avg. (5-run) | Best |
---|---|---|
Precision | 91.36 ± 0.50 | 92.14 |
Recall | 92.95 ± 0.24 | 93.20 |
F1 Score | 92.15 ± 0.35 | 92.67 |
Table 1: The test set performance of XLNet-large finetuned model on CoNLL2003-NER task with setting: batch size = 16, max length = 128, learning rate = 2e-5, num steps = 4,000
Figure 3: Illustrations of fine-tuning XLNet on ATIS-NLU task
ATIS - NLU | Avg. (5-run) | Best |
---|---|---|
Accuracy - Intent | 97.51 ± 0.09 | 97.54 |
F1 Score - Slot | 95.48 ± 0.30 | 95.73 |
Table 2: The test set performance of XLNet-large finetuned model on ATIS-NLU task with setting: batch size = 16, max length = 128, learning rate = 5e-5, num steps = 2,000
Figure 4: Illustrations of fine-tuning XLNet on SQuAD v1.1 task
SQuAD v1.1 | Avg. (5-run) | Best |
---|---|---|
Exact Match | xx.xx ± x.xx | 88.61 |
F1 Score | xx.xx ± x.xx | 94.28 |
Table 3: The test set performance of XLNet-large finetuned model on SQuAD v1.1 task with setting: batch size = 48, max sequence length = 512, max question length = 64, learning rate = 3e-5, num steps = 8,000
Figure 5: Illustrations of fine-tuning XLNet on SQuAD v2.0 task
SQuAD v2.0 | Avg. (5-run) | Best |
---|---|---|
Exact Match | xx.xx ± x.xx | 85.72 |
F1 Score | xx.xx ± x.xx | 88.36 |
Table 4: The test set performance of XLNet-large finetuned model on SQuAD v2.0 task with setting: batch size = 48, max sequence length = 512, max question length = 64, learning rate = 3e-5, num steps = 8,000
Figure 6: Illustrations of fine-tuning XLNet on CoQA v1.0 task
CoQA v1.0 | Avg. (5-run) | Best |
---|---|---|
Exact Match | xx.xx ± x.xx | 81.8 |
F1 Score | xx.xx ± x.xx | 89.4 |
Table 5: The test set performance of XLNet-large finetuned model on CoQA v1.0 task with setting: batch size = 48, max sequence length = 512, max question length = 128, learning rate = 3e-5, num steps = 6,000
Figure 7: Illustrations of fine-tuning XLNet on QuAC v0.2 task
QuAC v0.2 | Avg. (5-run) | Best |
---|---|---|
F1 Score | xx.xx ± x.xx | 71.5 |
HEQQ | xx.xx ± x.xx | 68.0 |
HEQD | xx.xx ± x.xx | 11.1 |
Table 6: The test set performance of XLNet-large finetuned model on QuAC v0.2 task with setting: batch size = 48, max sequence length = 512, max question length = 128, learning rate = 2e-5, num steps = 8,000