Predictive Modeling in Urgent Care

This is the code repository for manuscript "Predictive Modeling in Urgent Care: A Comparative Study of Machine Learning Approaches" by Fengyi Tang, Cao Xiao, Fei Wang, Jiayu Zhou.

Manuscript Abstract

Objective: The growing availability of rich clinical data such as patients' electronic health records (EHR) provide great opportunities to address a broad range of real-world questions in medicine. At the same time, artificial intelligence and machine learning based approaches have shown great premise on extracting insights from those data and helping with various clinical problems. The goal of this study is to conduct a systematic comparative study of different machine learning algorithms for several predictive modeling problems in urgent care.

Design: We assess the performance of four benchmark prediction tasks (e.g., mortality and prediction, differential diagnostics and disease marker discovery) using medical histories, physiological time-series and demographics data from the Medical Information Mart for Intensive Care (MIMIC-III) database.

Measurements: For each given task, performance was estimated using standard measures including the area under the receiver operating characteristic (AUC) curve, F-1 score, sensitivity and specificity. Micro-averaged AUC was used for multi-class classification models.

Results and Discussion: Our results suggest that recurrent neural networks show the most promise in mortality prediction where temporal patterns in physiologic features alone can capture in-hospital mortality risk (AUC > 0.90). Temporal models did not provide additional benefit compared to deep models in differential diagnostics. When comparing the training-testing behaviors of readmission and mortality models, we illustrate that readmission risk may be independent of patient stability at discharge. We also introduce a multi-class prediction scheme for length of stay which preserves sensitivity and AUC with outliers of increasing duration despite decrease in sample size.

Usages

Requirements

MIMIC-III

Please apply for access to the publicly available MIMIC-III DataBase via https://www.physionet.org/.

Instructions for Use

Workflow: MIMIC-III Access -> Obtain Views and Tables -> Preprocessing -> Pipeline

  1. Obtain access to MIMIC-III and clone this repo to local folder. Create a local MIMIC-III folder to store a few files:
    • .../local_mimic
    • .../local_mimic/views
    • .../local_mimic/tables
    • .../local_mimic/save

These paths will be important for storing views and pivot tables, which will be used for preprocessing.

  1. Build MIMIC-III database using postgres, follow the instructions outlined in the MIMIC-III repository: https://github.com/MIT-LCP/mimic-code/tree/master/buildmimic/postgres.

  2. Go to the pivot folder in the MIMIC-III repository: https://github.com/MIT-LCP/mimic-code/tree/master/concepts/pivot. Run use the .sql scripts to build a local set of .csv files of the pivot tables:

    • pivoted-bg.sql
    • pivoted_vital.sql
    • pivoted_lab.sql
    • pivoted_gcs.sql (optional)
    • pivoted_uo.sql (optional)

When running the .sql script, change the delimiter of the materialized views to ',' when saving as .csv file.

For example,
mimic=> \copy (select * FROM mimiciii.icustay_detail) to 'icustay_detail.csv' delimiter ',' csv header;

After running these scripts, you should have obtained local .csv files of the pivot tables. Create a local folder to place them in, i.e. .../local_mimic/views/pivoted-bg.csv. Remember this .../local_mimic/views folder, as it will be the path_views input for preprocessing purposes.

  1. Go to the demographics folder in the MIMIC-III repository: https://github.com/MIT-LCP/mimic-code/tree/master/concepts/demographics.

Run icustay-detail.sql and obtain a local .csv file of icustays-detail view. Create a local folder to place the .csv file in, i.e..../local_mimic/views/icustay_details.csv. Again, have this .csv file inside the local views folder.

A minor change needs to be made in icustay_details.csv:
change 'admission_age' -> 'age' for the column header in the .csv file manually.

  1. Obtain a local copy of the following tables from MIMIC-III:
    • admissions.csv
    • diagnoses_icd.csv
    • d_icd_diagnoses.csv

These can be directly obtained from Physionet as compressed files. While tables such as chartevents are large, the above tables are quite small and easy to query directly if a local copy is available.

Save these tables under .../local_mimic/tables folder. Make the following changes:

  1. Run preprocessing.py with inputs:
    • --path_tables <path_tables>
    • --path_views <path_views>
    • --path_save <path_save>.

<path_tables> and <path_views> should correspond to the folders under which the local tables and views (pivots and icustays-details) are saved. <path_save> corresponds to the desired folder to save your variables for training and beyond.

preprocessing.py will generate the following files:

If you find any errors or issues, please do not hesitate to report.