Short Text Mining in Python

Build Status GitHub release Documentation Status Updates Python 3


This package shorttext is a Python package that facilitates supervised and unsupervised learning for short text categorization. Due to the sparseness of words and the lack of information carried in the short texts themselves, an intermediate representation of the texts and documents are needed before they are put into any classification algorithm. In this package, it facilitates various types of these representations, including topic modeling and word-embedding algorithms.

Since release 1.2.4, it runs on Python 3.8. Since release 1.2.3, support for Python 3.5 was decommissioned. Since release 1.1.7, support for Python 2.7 was decommissioned. Since release 1.0.8, it runs on Python 3.7 with 'TensorFlow' being the backend for keras. Since release 1.0.7, it runs on Python 3.7 as well, but the backend for keras cannot be TensorFlow. Since release 1.0.0, shorttext runs on Python 2.7, 3.5, and 3.6.



Documentation and tutorials for shorttext can be found here:

See tutorial for how to use the package, and FAQ.


To install it, in a console, use pip.

>>> pip install -U shorttext

or, if you want the most recent development version on Github, type

>>> pip install -U git+

Developers are advised to make sure Keras >=2 be installed. Users are advised to install the backend Tensorflow (preferred) or Theano in advance. It is desirable if Cython has been previously installed too.

See installation guide for more details.


To report any issues, go to the Issues tab of the Github page and start a thread. It is welcome for developers to submit pull requests on their own to fix any errors.


If you would like to contribute, feel free to submit the pull requests. You can talk to me in advance through e-mails or the Issues page.

Useful Links


Possible Future Updates