The Latest

The toolkit is now in v0.3. The interfaces have been redesigned to be more modular and intuitive with input from the community, special thanks to @dgopstein and @shaikhismail! We welcome all input to make this tool more useful for you.

Software is Language Too!

We present the Software Language Processing (SLP) library + CLI, here to enable fluent modeling of evolving text! Check out our wiki for why we built this, how it works and a detailed getting started guide. Also keep an eye out on upcoming sibling projects in the SLP-team ecosystem!
Note: this tool was created as part of a publication at FSE'20171 and has since evolved to version 0.3

Quick Start

This core module contains everything needed to get started with modeling language, including code to lex files, build vocabularies, construct (complex) models and run those models. Get the latest Jar and use it either as a command-line tool or add it as a dependency to your project (code API) to get started. Other modules relying on SLP-Core for more exotic applications are forthcoming. Finally: any questions/ideas? Create an Issue! Any improvements/bug fixes? Submit a Pull Request! We greatly appreciate your help in making this useful for others.

Code API

The code is best used as a Java library, and well-documented examples of how to use this code for both NL and SE use-cases are in the src/slp/core/example package of this repo. Use this option if you want to go beyond summary metrics and build more interesting models and scenarios. To get started with the code as a library, either add the latest Jar to your dependencies (Maven dependency coming soon!) or download the whole project and link it to yours. See the wiki for a more thorough explanation and don't hesitate to ask questions as an Issue!

Command Line API

A very quick start: running a 6-gram, JM, cache model on train and test Java directories:
java -jar SLP-Core_v0.2.jar train-test --train train-path --test test-path -m jm -o 6 -l java -e java --cache

A break-down of what happened:

More options

For other options, type java -jar SLP-Core_v0.2.jar --help. See our wiki for some examples of what sequence of commands might be applicable for a typical NLP and SLP use-case with more details. A quick overview of what else can be done:

Release Notes

New in version 0.3

1If you are looking to replicate the FSE'17 results (which used version 0.1), see the "FSE'17 Replication" directory. However, if you want to work with a more advanced version of the code, this is the page to be! Our code-base is ever becoming faster and (hopefully) more accessible without compromising modeling accuracy.

2Every Model supports predict API calls, though prediction is a bit slower than entropy evaluation. Of course, batch-prediction itself doesn't really make sense for practical purposes but is a useful evaluation metric, and the code API can be used for code completion tasks. The scores from this command are MRR scores, which reflect prediction accuracy (closer to 1 is better).