PCORI: Improving Deep Learning for Clinical Prediction Model Development and Interpretability

This project aims to develop, validate and interpret a predictive deep learning (DL) model for use in patient-centered CER, leveraging methods that specifically incorporate patient partner and clinical stakeholder perspectives throughout model development. The primary focus is developing on Human-Centered AI for Clinical Decision Support The project will develop early opioid risk prediction models for Opioid Use Disorder (OUD) and Overdose (OD), and generalize the framework for other diseases such as heart failure.

PCORI Project Website

Project Overview

This project addresses the gap between ML model development and clinical adoption through:

Transparent Models: Interpretable predictions with SHAP-based explanations
Clinician-in-the-Loop Validation: Interactive dashboards for clinical review
Scalable Pipeline: Data processing for large EHR datasets
Multi-Model Support: LightGBM, Random Forest, LSTM, GRU

Program Management

Field	Value
Contract	PCORI 23C3
PI	Fusheng Wang, PhD and Richard Rosenthal, MD
Institution	Stony Brook University
Start Date	February 1, 2025
Reporting	Interim reports every 6 months

Key Milestones

Milestone	Due Date	Status
Study Protocol Submission	2025-03-01	Complete
IRB Documentation	2025-03-01	Complete
Cohort Identification	2025-05-15	Complete
Initial Model Design	2025-07-15	Complete
Progress Report #1	2025-07-31	Complete
Model Release 1 (GitHub)	2025-08-31	Complete
Interpretability Methods	2025-10-31	In Progress

Governance

AHDT Meetings: Bi-monthly All Hands Design Team meetings
Quarterly Surveys: Stakeholder engagement surveys
PCORI Oversight: Reports via PCORI Online portal

Research Compliance

IRB Status

Protocol: IRB2023-00456 (Stony Brook University IRB)
Status: Not Human Research

Data Governance

Data Source: Cerner Health Facts under executed DUA
De-identification: HIPAA Safe Harbor compliant
Access Control: Role-based, audit-logged
PHI: No direct identifiers in any dataset

Evidence & Validation

Datasets

Dataset	Records	Features	Purpose
Health Facts	~70M encounters	500+	Training/validation
Synthetic	10,000 patients	50	Development/testing

Model Performance (Internal Validation)

Model	AUROC	AUPRC	Accuracy
LightGBM	0.82	0.61	0.78
LSTM (T=10)	0.79	0.58	0.75
Logistic Reg.	0.75	0.52	0.72

External validation planned for Q2 2026.

Fairness Monitoring

Subgroup analysis by demographics (age, gender, race/ethnicity)
Calibration curves across groups
Disparate impact assessment planned

Limitations & Intended Use

Intended Use

Research and model development
Educational purposes
Stakeholder-in-the-loop validation prototyping

Limitations

Not for clinical decisions: Models not validated for direct patient care
Population specificity: Trained on specific populations
No FDA clearance: Requires separate regulatory review for deployment

Architecture

┌──────────────────────────────────────────────────────────────┐
│                    PCORI ML Infrastructure                    │
├──────────────────────────────────────────────────────────────┤
│  Raw EHR Data → Pipeline (ETL) → Feature Store (Parquet)     │
│                        ↓                                      │
│  Model Training: LightGBM | LSTM | GRU | Logistic Reg.       │
│                        ↓                                      │
│  Explainability: SHAP values, feature importance             │
│                        ↓                                      │
│  SITL Dashboard: Cohort Builder | Training UI | AI Chat      │
└──────────────────────────────────────────────────────────────┘

Components

Component	Description	Docs
SITL Dashboard	Clinician validation interface	README
Pipeline	Training pipeline	README
Feature Selection	NTK, LightGBM, Elastic Net methods	Docs

Quick Start

# Clone and setup
git clone https://github.com/StonyBrookDB/PCORI.git
cd PCORI
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Run dashboard
cd SITL_Dashboard_PCORI
pip install -r requirements.txt
python -m uvicorn backend.main:app --port 8503

# Train model
cd ../pipeline-pcori
python train.py --dataset ./data/synth --model lightgbm

Documentation

Citation

@software{pcori_sitl_2026,
  title = {PCORI: Human-Centered AI for Clinical Decision Support},
  author = {Wang, Fusheng and Liu, Yinan and Ding, Zihan},
  year = {2026},
  publisher = {Stony Brook University},
  url = {https://github.com/StonyBrookDB/PCORI},
  note = {Funded by PCORI Contract 23C3}
}

Acknowledgments

Funding: Patient-Centered Outcomes Research Institute (PCORI) Contract 23C3

Research Team: Department of Biomedical Informatics, Stony Brook University

Data Partner: Cerner Corporation (Health Facts)

License

MIT License - see LICENSE

This software is for research purposes only. Clinical deployment requires regulatory approval.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
SITL_Dashboard_PCORI		SITL_Dashboard_PCORI
dashboard		dashboard
docs		docs
feature_selection		feature_selection
model		model
pipeline-pcori		pipeline-pcori
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PCORI: Improving Deep Learning for Clinical Prediction Model Development and Interpretability

Project Overview

Program Management

Key Milestones

Governance

Research Compliance

IRB Status

Data Governance

Evidence & Validation

Datasets

Model Performance (Internal Validation)

Fairness Monitoring

Limitations & Intended Use

Intended Use

Limitations

Architecture

Components

Quick Start

Documentation

Citation

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

StonyBrookDB/PCORI

Folders and files

Latest commit

History

Repository files navigation

PCORI: Improving Deep Learning for Clinical Prediction Model Development and Interpretability

Project Overview

Program Management

Key Milestones

Governance

Research Compliance

IRB Status

Data Governance

Evidence & Validation

Datasets

Model Performance (Internal Validation)

Fairness Monitoring

Limitations & Intended Use

Intended Use

Limitations

Architecture

Components

Quick Start

Documentation

Citation

Acknowledgments

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages