Datashare

Circle CI Crowdin

Download

https://datashare.icij.org/

Documentation

Datashare's user guide can be found here: https://icij.gitbook.io/datashare/

Frontend

This repository is only the backend part of Datashare.

Please find the frontend here : https://github.com/ICIJ/datashare-client.

Description

Datashare is a free open-source desktop application developed by non-profit International Consortium of Investigative Journalists (ICIJ).

Datashare allows investigative journalists to:

Translation of the interface

You're welcome to suggest translations on Datashare's Crowdin https://crwd.in/datashare. Please contact us if you would like to add a language.

Installing and using

Using with elasticsearch

You can download the script at datashare.icij.org.

To access web GUI, go in your documents folder and launch path/to/datashare.sh then connect datashare on http://localhost:8080

Using only Named Entity Recognition

You can use the datashare docker container only for HTTP exposed name finding API.

Just run :

docker run -ti -p 8080:8080 -v /path/to/dist/:/home/datashare/dist icij/datashare:0.10 -m NER

A bit of explanation :

Then query with curl the server with :

curl -i localhost:8080/ner/findNames/CORENLP --data-binary @path/to/a/file.txt

The last path part (CORENLP) is the framework. You can choose it among CORENLP, IXAPIPE, MITIE or OPENNLP.

Extract Text from Files

Implementations

Support

Tika File Formats

Extract Persons, Organizations or Locations from Text

Implementations

Natural Language Processing Stages Support

NlpStage
TOKEN
SENTENCE
POS
NER

Named Entity Recognition Language Support

NlpStage.NER ENGLISH SPANISH GERMAN FRENCH CHINESE
NlpPipeline.Type.CORENLP X X X (w/ EN) X
NlpPipeline.Type.OPENNLP X X - X -
NlpPipeline.Type.IXAPIPE X X X - -
NlpPipeline.Type.MITIE X X X - -

Named Entity Categories Support

NamedEntity.Category
ORGANIZATION
PERSON
LOCATION

Parts-of-Speech Language Support

NlpStage.POS ENGLISH SPANISH GERMAN FRENCH
NlpPipeline.Type.CORE X X X X
NlpPipeline.Type.OPEN X X X X
NlpPipeline.Type.IXA X X X X
NlpPipeline.Type.MITIE - - - -

Store and Search Documents and Named Entities

Implementations

Compilation / Build

Requires JDK 8, Maven 3 and a running PostgreSQL database (hostname postgresql) with two databases datashare and test with write access for user test / password test. You'll need also a running elasticsearch instance with elasticsearch as hostname ; and a redis server named redis as well.

mvn validate
mvn -pl datashare-db liquibase:update
mvn test

License

Datashare is released under the GNU Affero General Public License

Feedback

We welcome feedback as well as contributions!

For any bug, question, comment or (pull) request,

please contact us at [email protected]