rnnmorph

Current version on PyPI Python versions Build Status Code Climate

Morphological analyzer (POS tagger) for Russian and English languages based on neural networks and dictionary-lookup systems (pymorphy2, nltk).

Russian language, MorphoRuEval-2017 test dataset, accuracy

Domain Full tag PoS tag F.t. + lemma Sentence f.t. Sentence f.t.l.
Lenta (news) 96.31% 98.01% 92.96% 77.93% 52.79%
VK (social) 95.20% 98.04% 92.06% 74.30% 60.56%
JZ (lit.) 95.87% 98.71% 90.45% 73.10% 43.15%
All 95.81% 98.26% N/A 74.92% N/A

English language, UD EWT test, accuracy

Dataset Full tag PoS tag F.t. + lemma Sentence f.t. Sentence f.t.l.
UD EWT test 91.57% 94.10% 87.02% 63.17% 50.99%

Speed and memory consumption

Speed: from 200 to 600 words per second using CPU.

Memory consumption: about 500-600 MB for single-sentence predictions

Install

sudo pip3 install rnnmorph

Usage

from rnnmorph.predictor import RNNMorphPredictor
predictor = RNNMorphPredictor(language="ru")
forms = predictor.predict(["мама", "мыла", "раму"])
print(forms[0].pos)
>>> NOUN
print(forms[0].tag)
>>> Case=Nom|Gender=Fem|Number=Sing
print(forms[0].normal_form)
>>> мама
print(forms[0].vector)
>>> [0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1]

Training

Simple model training: Open In Colab

Acknowledgements