Build Status Coverage Status

rex

REx: Relation Extraction. Modernized re-write of the code in the master's thesis: "Relation Extraction using Distant Supervision, SVMs, and Probabalistic First-Order Logic"

The thesis is here.

Setup

This project uses sbt for build management. If you're unfamiliar with sbt, see the last section for some pointers.

Build

To download all dependencies and compile code, run sbt compile.

Test

To run all tests, execute sbt test.

Moreover, to see code coverage, first run coverage, then test. The coverage report will be output as an HTML file.

Command Line Applications

To produce bash scripts that will execute each individual command-line application within this codebase, execute sbt pack.

Data

This project includes data that allows one to distantly supervise relation mentions in text. The files are located under data/: a local README further explains the data content, format, and purpose.

These files are large and are stored using git-lfs. Be sure to follow the appropriate instructions and ensure that you've set up this git plugin (i.e. have performed git lfs install once).

Example

To evaluate relation extraction performance on the UIUC relation dataset using 3 fold cross- validation, first build the executable scripts with sbt pack then execute:

./target/pack/bin/relation-extraction-learning-main \
learn_eval \
-li data/uiuc_cog_comp_group-entity_and_relation_recognition_corpora/all.corp \
--input_format uiuc \
-cg true \
--cost 1 \
--epsilon 0.003 \ 
--n_cv_folds 3

Where:

Invoking this program with the --help flag, or with no arguments, will output a detailed help message to stdout.

License

Everything within this repository is copyright (2015-) by Malcolm Greaves.

Use of this code is permitted according to the stipulations of the Apache 2 license.

How to use sbt

When using sbt, it is best to start it in the "interactive shell mode". To do this, simply execute from the command line:

$ sbt

After starting up (give it a few seconds), you can execute the following commands:

compile // compiles code
pack // creates executable scripts
test // runs tests
coverage / initializes the code-coverage system, use right before 'test'
reload // re-loads the sbt build definition, including plugin definitions
update // grabs all dependencies

There are a lot more commands for sbt. And a ton of community plugins that extend sbt's functionality.

Tips

Not necessary! Just a few suggestions...

We recommend using the following configuration for sbt:

sbt -J-XX:MaxPermSize=768m -J-Xmx2g -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled

This gives some more memory to sbt, gives it a better default GC option, and enables a better class loading & unloading module.

Also, to limit the logging output of the Spark framework export this environment variable before running tests:

export SPARK_CONF_DIR="<YOUR_PATH_TO_THIS_REPO>/src/main/resources"