Skip to content

PASS evaluates clinical data across six dimensions to quantify an OMOP database's fitness for research and analytics

Notifications You must be signed in to change notification settings

Analyticsphere/PASS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PASS: Profile of Analytic Suitability Score

A data quality assessment tool for OMOP Common Data Model databases. The Profile of Analytic Suitability Score (PASS) evaluates clinical data across six dimensions to quantify its fitness for research and analytics.

Overview

PASS calculates standardized metrics (0-1 scale) that measure different aspects of OMOP CDM data quality:

  • Accessibility: Are clinical facts present and discoverable?
  • Provenance: How well are facts coded and traceable to source data?
  • Standards: Are OHDSI standard concepts being used?
  • Concept Diversity: Is there variety in the concepts represented?
  • Source Diversity: How many different data sources contribute?
  • Temporal: How is data distributed over time?

Each metric produces field-level, table-level, and overall scores with 95% confidence intervals. A composite PASS aggregates individual metrics into a single quality measure.

Metrics

Accessibility

Evaluates whether clinical facts exist in concept_id fields. Scores range from 1.0 (concept present) to 0.5 (source code only) to 0.05 (text only) to 0.0 (absent). Includes pseudo-fields for custom completeness checks (e.g., measurement results, note text).

Provenance

Measures coding quality and source traceability. Native vocabulary usage scores 1.0, mapped codes 0.95, mapped text 0.75, and untraceable concepts 0.0.

Standards

Binary assessment of OHDSI standard concept usage. Standard concepts score 1.0, non-standard 0.0.

Concept Diversity

Shannon entropy of concept distributions within each field. Normalized to [0,1] where 1.0 indicates perfect diversity and 0.0 indicates no variety.

Source Diversity

Counts unique type_concept_id values per table using exponential decay normalization (1 - exp(-n/k)). Asymptotically approaches 1.0 as source count increases.

Temporal

Combines three sub-scores: range (years of coverage), density (rows per patient per quarter), and consistency (temporal stability via coefficient of variation).

Usage

Basic Example

library(pass)

# Create database connection
conn <- create_pass_connection(
  project_id = "my-project",
  dataset = "omop_cdm",
  jdbc_driver_path = "~/bigquery_driver/"
)

# Load default configuration
config <- load_pass_config()

# Calculate all metrics
results <- calculate_pass(
  conn = conn,
  schema = "my-project.omop_cdm",
  config = config,
  metrics = "all",
  output_dir = "output/"
)

# Disconnect
disconnect_pass(conn)

Calculate Specific Metrics

# Run only accessibility and temporal
results <- calculate_pass(
  conn = conn,
  schema = "my-project.omop_cdm",
  config = config,
  metrics = c("accessibility", "temporal"),
  output_dir = "output/"
)

Custom Configuration

# Load custom configuration files
config <- load_pass_config(
  concept_fields_path = "path/to/custom_concept_fields.csv",
  type_fields_path = "path/to/custom_type_fields.csv",
  date_fields_path = "path/to/custom_date_fields.csv"
)

results <- calculate_pass(conn, schema, config)

Custom Composite Weights

# Adjust metric weights in composite score
results <- calculate_pass(
  conn = conn,
  schema = "my-project.omop_cdm",
  config = config,
  metrics = "all",
  composite_weights = list(
    accessibility = 1.5,
    provenance = 1.0,
    standards = 1.0,
    concept_diversity = 0.5,
    source_diversity = 1.0,
    temporal = 1.0
  )
)

Configuration

The package includes default configuration files that define which fields to evaluate. These can be customized by providing your own CSV files.

Default Configuration Files

Configuration files are located in inst/config/:

concept_fields_with_weights.csv Defines which concept_id fields to evaluate and their analytical importance weights (0-1 scale).

table,concept_id_field,source_concept_id_field,source_value_field,multiplier,rationale
condition_occurrence,condition_concept_id,condition_source_concept_id,condition_source_value,1.0,Primary diagnosis field

type_concept_id_fields.csv Specifies type_concept_id fields for source diversity analysis.

table,type_concept_id
condition_occurrence,condition_type_concept_id

date_fields.csv Defines primary date fields for temporal analysis.

table,date_field
condition_occurrence,condition_start_date

Customizing Field Weights

To adjust field importance in your analysis:

  1. Export default configuration:
default_config <- system.file("config", "concept_fields_with_weights.csv", package = "pass")
file.copy(default_config, "my_custom_config.csv")
  1. Edit my_custom_config.csv to adjust multipliers

  2. Load custom configuration:

config <- load_pass_config(concept_fields_path = "my_custom_config.csv")

Output

Results are written to the output/ directory as CSV files:

Per-Metric Output

Each metric generates three files:

  • pass_{metric}_field_level.csv - Scores for each concept_id field
  • pass_{metric}_table_level.csv - Aggregated scores per table
  • pass_{metric}_overall.csv - Dataset-wide score with confidence interval

Composite Score Output

  • pass_composite_overall.csv - Weighted composite PASS
  • pass_composite_components.csv - Individual metric contributions

Interpreting Scores

  • 1.0: Perfect quality on this dimension
  • 0.8-0.99: Good quality with minor issues
  • 0.6-0.79: Moderate quality, room for improvement
  • 0.4-0.59: Poor quality, significant gaps
  • < 0.4: Very poor quality, major data issues
  • NA: Not evaluated (e.g., empty table, insufficient data)

Package Structure

pass/
├── DESCRIPTION                 # Package metadata
├── NAMESPACE                   # Exported functions
├── R/                         # R source code
│   ├── calculate_pass.R       # Main user function
│   ├── config_helpers.R       # Configuration loading
│   ├── connection_helpers.R   # Database connection
│   ├── config.R
│   ├── connection.R
│   ├── composite_score.R      # Composite score calculation
│   └── metrics/
│       ├── accessibility.R
│       ├── provenance.R
│       ├── standards.R
│       ├── concept_diversity.R
│       ├── source_diversity.R
│       ├── temporal.R
│       └── domain_completeness.R
├── inst/
│   ├── config/               # Default configuration files
│   │   ├── concept_fields_with_weights.csv
│   │   ├── type_concept_id_fields.csv
│   │   └── date_fields.csv
│   └── examples/             # Example usage scripts
│       └── calculate_pass_example.R
├── man/                      # Function documentation (auto-generated)
├── vignettes/                # Package vignettes
│   └── scoring_methodology.Rmd
└── README.md

Advanced Usage

Pseudo-Fields

Create custom completeness checks by adding pseudo-fields (prefix with __) to your custom configuration:

measurement,value_as_concept_id,,,0,Not evaluated - see pseudo-field
measurement,__result_completeness__,,,1.0,Custom result completeness logic

Then implement custom logic by modifying R/metrics/accessibility.R:build_pseudo_field_sql().

Programmatic Access

Access results programmatically without saving to files:

results <- calculate_pass(
  conn = conn,
  schema = "my-project.omop_cdm",
  config = config,
  output_dir = NULL  # Don't save CSV files
)

# Access overall scores
accessibility_score <- results$accessibility$overall$overall_score
temporal_score <- results$temporal$overall$overall_temporal_score

# Access field-level details
field_scores <- results$accessibility$field_level

About

PASS evaluates clinical data across six dimensions to quantify an OMOP database's fitness for research and analytics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages