Odinson can be used to rapidly query a natural language knowledge base and extract structured relations. Query patterns can be designed over (a) surface (e.g. #1), syntax (e.g., #2), or a combination of both (e.g., #3-5). These examples were executed over a collection of 8,479 scientific papers, corresponding to 1,105,737 sentences. Please note that the rapidity of the execution allows a user to dynamically develop these queries in real-time, immediately receiving feedback on the coverage and precision of the patterns at scale. Please see our forthcoming LREC 2020 paper for technical details and evaluation.
Odinson supports several features:
And there are many more on the way:
We would also love to hear any questions, requests, or suggestions you may have.
It consists of several subprojects:
The three apps in extra are:
To build docker images locally, run the following command via sbt:
sbt dockerize
We also publish images to dockerhub (see below for information on our docker images).
docker pull lumai/odinson-extras:latest
See our repository for other tags.
docker pull lumai/odinson-rest-api:latest
See our repository for other tags.
docker run \
--name="odinson-extras" \
-it \
--rm \
-e "HOME=/app" \
-e "JAVA_OPTS=-Dodinson.extra.processorType=CluProcessor" \
-v "/path/to/data/odinson:/app/data/odinson" \
--entrypoint "bin/annotate-text" \
"lumai/odinson-extras:latest"
NOTE: Replace /path/to/data/odinson
with the path to the directory containing a directory called text
containing the .txt
files you want to annotate. Compressed OdinsonDocument JSON will be written to a directory called docs
under whatever you use for /path/to/data/odinson
.
docker run \
--name="odinson-extras" \
-it \
--rm \
-e "HOME=/app" \
-v "/path/to/data/odinson:/app/data/odinson" \
--entrypoint "bin/index-documents" \
"lumai/odinson-extras:latest"
NOTE: Replace /path/to/data/odinson
with the path to the directory containing docs
. The index will be written to a directory called index
under whatever you use for /path/to/data/odinson
.
docker run \
--name="odinson-extras" \
-it \
--rm \
-e "HOME=/app" \
-v "/path/to/data/odinson:/app/data/odinson" \
--entrypoint "bin/shell" \
"lumai/odinson-extras:latest"
NOTE: Replace /path/to/data/odinson
with the path to the directory containing index
(created via the IndexDocuments
runnable).
docker run \
--name="odinson-rest-api" \
-it \
--rm \
-e "HOME=/app" \
-p "0.0.0.0:9001:9000" \
-v "/path/to/data/odinson:/app/data/odinson" \
"lumai/odinson-rest-api:latest"
After starting the service, open your browser to localhost:9001.
NOTE: Replace /path/to/data/odinson
with the path to the directory containing docs
and index
(created via AnnotateText
and IndexDocuments
runnables).
Logs can be viewed by running docker logs -f "odinson-rest-api"
We have made a few example queries to show how the system works. For this we used a collection of 8,479 scientific papers (or 1,105,737 sentences). Please note that the rapidity of the execution allows a user to dynamically develop these queries in real-time, immediately receiving feedback on the coverage and precision of the patterns at scale.
This example shows odinson applying a pattern over surface features (i.e., words) to extract mentions of causal relations. Note that Odinson was able to find 3,774 sentences that match the pattern in 0.18 seconds.
This example shows how Odinson can also use patterns over syntax. In this case it tries to find hypernym relations. It finds 10,562 matches in 0.37 seconds.
This example shows how surface and syntax can be combined in a single pattern. This pattern finds 12 sentences that match in our corpus of 1,105,737 sentences. It does this in 0.01 seconds.
This example shows how we can match over different aspects of tokens, lemmas in this example. Note that the ability to utilize syntax helps with the precision of the extractions (as compared with the overly simple surface rule above). Odinson finds 5,489 matches in 0.18 seconds.
This is an example of a slightly more complex pattern. Odinson is able to apply it over our corpus and finds 228 matches in 0.04 seconds.
We are also working on a web interface that will simplify debugging by displaying more information than the shell. This interface will allow us to display syntactic information when needed. We would also like to be able to interact with it to correct extractions or bootstrap patterns.