Word Sense Disambiguation using BERT, ELMo and Flair


This repository contains the implementation of our approach proposed in the paper

Wiedemann, G., Remus, S., Chawla, A., Biemann, C. (2019): Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings. Proceedings of KONVENS 2019, Erlangen, Germany.

Word Sense Disambiguation (WSD) is the task to identify the correct sense of the usage of a word from a (usually) fixed inventory of sense identifiers. We propose a simple approach that scans through the training data to learn the Contextualized Word Embeddings(CWE) of sense labels and classifies the ambiguous words on the basis Cosine Similarity with the learnt CWEs for that word.



This section will guide you through the steps to reproduce all the results that were mentioned in the paper.

We use the following corpus files from the UFSAC repository in our experiments:

Dataset Name Training File Name Testing File Name
Senseval 2 senseval2_lexical_sample_train.xml senseval2_lexical_sample_test.xml
Senseval 3 senseval3task6_train.xml senseval3task6_test.xml
Semcor semcor.xml -
WNGT wngt.xml -
Semeval 2007 task 7 - semeval2007task7.xml
Semeval 2007 task 17 - semeval2007task17.xml

Before we proceed to see the steps to reproduce our results, please note a few points:

To reproduce our results:

The repository is composed of three python files:

The steps to carry out the experiments are same for all the files. We shall demonstrate how to reproduce the results for BERT model and you can follow the same steps for the other 2 files as well.

Run python BERT_Model.py with the following main arguments:

Dataset Argument value
Semeval SE
Semcor SEM

You may follow the same steps as mentioned above to reproduce the results for Flair and ELMo Model using Flair_Model.py and ELMO_Model.py files respectively.

Below, we provide two examples to understand how to use the arguments mentioned above:

  1. To generate the results for SE-2 as mentioned in Table 3 of the paper, run the BERT_Model.py file as follows:

    python BERT_Model.py --use_cuda=True --device=cuda:0 --train_corpus=senseval2_lexical_sample_train.xml trained_pickle=BERT_embs.pickle --test_corpus=senseval2_lexical_sample_test.xml --start_k=1 --end_k=10 --save_xml_to=SE2.xml --use_euclidean=0 --reduced_search=0

  1. To generate the results for S7-T7 on WNGT as mentioned in Table 5 of the paper, run the BERT_Model.py file as follows:

    python BERT_Model.py --use_cuda=True --device=cuda:0 --train_corpus=wngt.xml trained_pickle=WNGT_BERT_embs.pickle --test_corpus=semeval2007task7.xml --start_k=1 --end_k=10 --save_xml_to=SE7T17.xml --use_euclidean=0 --reduced_search=1


The final output is stored in an XML file where the 'word' tag will now have an extra attribute named 'WSD'. This will be our model's prediction. To test the accuracy, we use a script in the UFSAC repository. Following are the steps to obtain Precision, Recall, F1 score, etc. for our prediction file.