EliIE (Eligiblity Criteria Information Extraction)

Introduction

A parser designed for free text clinical trial eligibility criteria (CTEC). Parsing free text CTEC and formalizing into OMOP CDM v5 table
The parser was trained on 250 clinical trials on Alzheimer's. The annotation guidelines is in folder Supple Materials.

Developed in Dr. Chunhua Weng's lab in Department of Biomedical Informatics at Columbia

Author: Tian Kang
Affiliation: Department of Biomedical Informatics, Columbia University
Contact Email: tk2624@cumc.columbia.edu
Last update: June 20, 2016 (add Negation detection in NER step)
Version: 1.0
Citation:EliIE: An open-source information extraction system for clinical trial eligibility criteria

Primary steps:

  1. Entity recogntion
  2. Attribute recognition
  3. Clinical relation identification
  4. Data standardization

Exmaple input:

Example output:

User Guide

First download all codes and decompress

Fast Usage:

  1. open wrapper_for_parsing.sh
  2. set the parameter lists to your task-based ones
  3. run "sh wrapper_for_parsing.sh" and parsing results will be generated in XML files.
    (See example output directly running "sh wrapper_for_parsing.sh" without changing )

Step-by-stey Usage:

  1. NER step: run
    python NamedEntityRecognition.py $1:<input directory> $2:<input text name> $3:<output directory>
  2. Clinical Relation: run
    python Relation.py $3:<output directory> $2:<input text name>

(Example commands:

  1. python NamedEntityRecognition.py Tempfile test.txt Tempfile
  2. python Relation.py Tempfile test.txt
    The example output would be Tempfile/test_NER.xml and Tempfile/test_Parsed.xml)

Prerequired Installation:

  1. This parser assumes MetaMap is installed and requires that the MetaMap support services are running. If you have MetaMap installed in $MM, these can be started as:
    $MM/bin/skrmedpostctl start
    $MM/bin/wsdserverctl start

    Go to features_dir and open metamap_tag.sh; follow the guidance to change the MetaMap root dir and start running

  2. Python package required:
    nltk
    networkx
    codecs
    libsvm
    practnlptools

  3. CRF ++
    Easy installation following the instruction:

Functions Under Developing

  1. Stadardize entities and attributes concepts using OHDSI standards
  2. Convert the final format into JSON
  3. Extend use case to more diseases

BACK TO TOP