Build Status PyPI license FOSSA Status

pywsd

Python Implementations of Word Sense Disambiguation (WSD) technologies:

NOTE: PyWSD only supports Python 3 now (pywsd>=1.2.0). If you're using Python 2, the last possible version is pywsd==1.1.7.

Install

pip install -U nltk
python -m nltk.downloader 'popular'
pip install -U pywsd

Usage

$ python
>>> from pywsd.lesk import simple_lesk
>>> sent = 'I went to the bank to deposit my money'
>>> ambiguous = 'bank'
>>> answer = simple_lesk(sent, ambiguous, pos='n')
>>> print answer
Synset('depository_financial_institution.n.01')
>>> print answer.definition()
'a financial institution that accepts deposits and channels the money into lending activities'

For all-words WSD, try:

>>> from pywsd import disambiguate
>>> from pywsd.similarity import max_similarity as maxsim
>>> disambiguate('I went to the bank to deposit my money')
[('I', None), ('went', Synset('run_low.v.01')), ('to', None), ('the', None), ('bank', Synset('depository_financial_institution.n.01')), ('to', None), ('deposit', Synset('deposit.v.02')), ('my', None), ('money', Synset('money.n.03'))]
>>> disambiguate('I went to the bank to deposit my money', algorithm=maxsim, similarity_option='wup', keepLemmas=True)
[('I', 'i', None), ('went', u'go', Synset('sound.v.02')), ('to', 'to', None), ('the', 'the', None), ('bank', 'bank', Synset('bank.n.06')), ('to', 'to', None), ('deposit', 'deposit', Synset('deposit.v.02')), ('my', 'my', None), ('money', 'money', Synset('money.n.01'))]

To read pre-computed signatures per synset:

>>> from pywsd.lesk import cached_signatures
>>> cached_signatures['dog.n.01']['simple']
set([u'canid', u'belgian_griffon', u'breed', u'barker', ... , u'genus', u'newfoundland'])
>>> cached_signatures['dog.n.01']['adapted']
set([u'canid', u'belgian_griffon', u'breed', u'leonberg', ... , u'newfoundland', u'pack'])

>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('dog')[0]
Synset('dog.n.01')
>>> dog = wn.synsets('dog')[0]
>>> dog.name()
u'dog.n.01'
>>> cached_signatures[dog.name()]['simple']
set([u'canid', u'belgian_griffon', u'breed', u'barker', ... , u'genus', u'newfoundland'])

Cite

To cite pywsd:

Liling Tan. 2014. Pywsd: Python Implementations of Word Sense Disambiguation (WSD) Technologies [software]. Retrieved from https://github.com/alvations/pywsd

In bibtex:

@misc{pywsd14,
author =   {Liling Tan},
title =    {Pywsd: Python Implementations of Word Sense Disambiguation (WSD) Technologies [software]},
howpublished = {https://github.com/alvations/pywsd},
year = {2014}
}

References