diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index af3a1b11..3ba53940 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -2,12 +2,17 @@ ## Repository Overview -**serodynamics** is an R package for modeling longitudinal antibody responses to infection. It implements Bayesian MCMC methods using JAGS (Just Another Gibbs Sampler) to estimate antibody dynamic curve parameters including baseline concentration, peak concentration, time to peak, shape parameter, and decay rate. +**serodynamics** is an R package for modeling longitudinal antibody responses to infection. It implements Bayesian MCMC methods to estimate antibody dynamic curve parameters including baseline concentration, peak concentration, time to peak, shape parameter, and decay rate. + +The package supports two Bayesian modeling backends: +- **JAGS** (Just Another Gibbs Sampler) via `runjags` - the original implementation +- **Stan** via `cmdstanr` - a modern, efficient alternative (optional) - **Type**: R package (statistical modeling) - **Size**: ~121MB, ~209 files, ~76 R source files, ~1,664 lines of R code - **Language**: R (>= 4.1.0) - **Key Dependencies**: runjags, rjags, JAGS 4.3.1, serocalculator, ggmcmc, dplyr, ggplot2 +- **Optional Dependencies**: cmdstanr (for Stan support) - **Lifecycle**: Experimental (under active development) ## Critical Setup Requirements @@ -213,9 +218,9 @@ devtools::check_man() - **Windows**: Install Rtools from https://cran.r-project.org/bin/windows/Rtools/ (choose version matching your R version) -### JAGS Installation (REQUIRED) +### JAGS Installation (REQUIRED for JAGS models) -**ALWAYS install JAGS before attempting to build, test, or run this package.** The package will fail without it. +**Install JAGS if you plan to use `run_mod()` with JAGS models.** JAGS is required for the original JAGS-based modeling functions but is not needed if you only use Stan models via `run_mod_stan()`. #### Installing JAGS in Docker (if using rocker/verse) @@ -250,6 +255,25 @@ runjags::findJAGS() runjags::testjags() ``` +### Stan Installation (OPTIONAL for Stan models) + +**Install cmdstanr and CmdStan if you plan to use `run_mod_stan()`.** Stan support is optional and provides a modern alternative to JAGS. + +```r +# Install cmdstanr from r-universe +install.packages("cmdstanr", + repos = c("https://stan-dev.r-universe.dev", + getOption("repos"))) + +# Then install CmdStan +cmdstanr::install_cmdstan() + +# Verify installation +cmdstanr::cmdstan_version() +``` + +See the [cmdstanr documentation](https://mc-stan.org/cmdstanr/) for more details and troubleshooting. + ## Build and Development Workflow ### Initial Setup @@ -374,10 +398,14 @@ Team members can trigger actions by commenting on PRs: ### Key Directories -- **R/**: Package source code (30 R files) +- **R/**: Package source code (30+ R files) - `Run_Mod.R`: Main function to run JAGS Bayesian models + - `run_mod_stan.R`: Main function to run Stan Bayesian models (new) - `as_case_data.R`: Convert data to case_data class - - `prep_data.r`, `prep_priors.R`: Data preparation for JAGS + - `prep_data.r`: Data preparation for JAGS + - `prep_data_stan.R`: Data preparation for Stan (new) + - `prep_priors.R`: Prior preparation for JAGS + - `prep_priors_stan.R`: Prior preparation for Stan (new) - `sim_case_data.R`: Simulate case data for testing - `post_summ.R`, `postprocess_jags_output.R`: Post-processing JAGS results - `plot_*.R`: Diagnostic plotting functions (trace, density, Rhat, effective sample size) @@ -397,8 +425,13 @@ Team members can trigger actions by commenting on PRs: - **data-raw/**: Raw data processing scripts (not included in package build) - **inst/**: Installed files - - `inst/extdata/`: JAGS model files (`model.jags`, `model.dobson.jags`), example CSV data + - `inst/extdata/`: Model files + - `model.jags`, `model.dobson.jags`: JAGS model files + - `model.stan`, `model.dobson.stan`: Stan model files (new) + - Example CSV data files - `inst/examples/`: Example R scripts for documentation + - JAGS examples: `run_mod-examples.R`, `examples-prep_priors.R` + - Stan examples: `run_mod_stan-examples.R`, `examples-prep_priors_stan.R`, `examples-prep_data_stan.R` (new) - `inst/WORDLIST`: Custom spelling dictionary - **vignettes/**: Package vignettes diff --git a/.github/workflows/copilot-setup-steps.yml b/.github/workflows/copilot-setup-steps.yml index 251368ad..2aec2bce 100644 --- a/.github/workflows/copilot-setup-steps.yml +++ b/.github/workflows/copilot-setup-steps.yml @@ -106,6 +106,19 @@ jobs: runjags::testjags() shell: Rscript {0} + # Install cmdstanr for Stan modeling support (optional) + # See .github/copilot-instructions.md "Stan Installation" section + - name: Install cmdstanr + run: | + install.packages("cmdstanr", + repos = c("https://stan-dev.r-universe.dev", + "https://cloud.r-project.org"), + verbose = TRUE) + cmdstanr::install_cmdstan(cores = 2) + cat("cmdstanr version:", as.character(packageVersion("cmdstanr")), "\n") + cat("CmdStan version:", cmdstanr::cmdstan_version(), "\n") + shell: Rscript {0} + # Verify JAGS installation # See .github/copilot-instructions.md "JAGS Installation" section for details - name: Verify JAGS installation @@ -144,7 +157,7 @@ jobs: # Display key installed packages cat("Key installed packages:\n") - key_packages <- c("devtools", "rjags", "runjags", "rcmdcheck", "lintr", "spelling", "testthat") + key_packages <- c("devtools", "rjags", "runjags", "cmdstanr", "rcmdcheck", "lintr", "spelling", "testthat") for (pkg in key_packages) { if (requireNamespace(pkg, quietly = TRUE)) { cat(" -", pkg, ":", as.character(packageVersion(pkg)), "\n") diff --git a/.gitignore b/.gitignore index e21e5425..8687c953 100644 --- a/.gitignore +++ b/.gitignore @@ -28,3 +28,4 @@ docs/ **/*.quarto_ipynb README.html +*.knit.* diff --git a/DESCRIPTION b/DESCRIPTION index ea5d92de..fb9d43e3 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: serodynamics Title: What the Package Does (One Line, Title Case) -Version: 0.0.0.9047 +Version: 0.0.0.9048 Authors@R: c( person("Peter", "Teunis", , "p.teunis@emory.edu", role = c("aut", "cph"), comment = "Author of the method and original code."), @@ -33,6 +33,7 @@ Imports: tidyselect, utils Suggests: + cmdstanr, Hmisc, knitr, readr, @@ -63,6 +64,7 @@ Config/needs/test: rlist Config/needs/check: commonmark, xml2 Remotes: + stan-dev/cmdstanr, ucd-serg/serocalculator Depends: R (>= 4.1.0) diff --git a/NAMESPACE b/NAMESPACE index 2db4d027..13d233a3 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -16,8 +16,11 @@ export(plot_predicted_curve) export(post_summ) export(postprocess_jags_output) export(prep_data) +export(prep_data_stan) export(prep_priors) +export(prep_priors_stan) export(run_mod) +export(run_mod_stan) export(serodynamics_example) export(sim_case_data) export(sim_n_obs) diff --git a/NEWS.md b/NEWS.md index 89199522..02ffd1ff 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,14 @@ # serodynamics (development version) +## New features + +* Added Stan support as an alternative to JAGS for Bayesian modeling + * New `run_mod_stan()` function for fitting models with Stan/cmdstanr + * New `prep_data_stan()` function to prepare data in Stan format + * New `prep_priors_stan()` function to prepare priors for Stan models + * New Stan model files: `inst/extdata/model.stan` and `inst/extdata/model.dobson.stan` + * Stan support is optional (cmdstanr in Suggests) and can be used alongside JAGS + * Added dev container configuration for persistent, cached development environment that includes R, JAGS, and all dependencies preinstalled, making Copilot Workspace sessions much faster. diff --git a/R/prep_data_stan.R b/R/prep_data_stan.R new file mode 100644 index 00000000..7c437500 --- /dev/null +++ b/R/prep_data_stan.R @@ -0,0 +1,76 @@ +#' Prepare data for Stan +#' +#' @param dataframe a [data.frame] containing case data +#' @param biomarker_column [character] string indicating +#' which column contains antigen-isotype names +#' @param verbose whether to produce verbose messaging +#' @param add_newperson whether to add an extra record with missing data +#' +#' @returns a `prepped_stan_data` object (a [list] with Stan-formatted data) +#' @export +#' +#' @examples +#' set.seed(1) +#' raw_data <- +#' serocalculator::typhoid_curves_nostrat_100 |> +#' sim_case_data(n = 5) +#' prepped_data <- prep_data_stan(raw_data) +prep_data_stan <- function( + dataframe, + biomarker_column = get_biomarker_names_var(dataframe), + verbose = FALSE, + add_newperson = TRUE) { + + # First use existing prep_data function to get the base structure + jags_data <- prep_data( + dataframe = dataframe, + biomarker_column = biomarker_column, + verbose = verbose, + add_newperson = add_newperson + ) + + # Convert to Stan format + # Stan requires explicit max dimensions + max_nsmpl <- max(jags_data$nsmpl) + + # Create padded arrays (Stan doesn't handle ragged arrays like JAGS) + # We need to pad smpl.t and logy to max_nsmpl + nsubj <- jags_data$nsubj + n_antigen_isos <- jags_data$n_antigen_isos + + # Initialize with zeros (will be ignored in model for obs > nsmpl[subj]) + smpl_t_padded <- array(0, dim = c(nsubj, max_nsmpl)) + logy_padded <- array(0, dim = c(nsubj, max_nsmpl, n_antigen_isos)) + + # Fill in actual data + for (subj in 1:nsubj) { + n_obs <- jags_data$nsmpl[subj] + if (n_obs > 0) { + smpl_t_padded[subj, 1:n_obs] <- jags_data$smpl.t[subj, 1:n_obs] + for (k in 1:n_antigen_isos) { + logy_padded[subj, 1:n_obs, k] <- jags_data$logy[subj, 1:n_obs, k] + } + } + } + + stan_data <- list( + nsubj = nsubj, + n_antigen_isos = n_antigen_isos, + n_params = 5, # y0, y1, t1, alpha, shape + nsmpl = as.integer(jags_data$nsmpl), + max_nsmpl = as.integer(max_nsmpl), + smpl_t = smpl_t_padded, + logy = logy_padded + ) + + # Add attributes from JAGS data + stan_data <- stan_data |> + structure( + class = c("prepped_stan_data", "list"), + antigens = attributes(jags_data)$antigens, + n_antigens = attributes(jags_data)$n_antigens, + ids = attributes(jags_data)$ids + ) + + return(stan_data) +} diff --git a/R/prep_priors_stan.R b/R/prep_priors_stan.R new file mode 100644 index 00000000..73abb9d1 --- /dev/null +++ b/R/prep_priors_stan.R @@ -0,0 +1,80 @@ +#' @title Prepare priors for Stan +#' @description +#' Takes multiple [vector] inputs to allow for modifiable priors for Stan +#' models. Converts JAGS precision-based priors to Stan covariance-based +#' priors. +#' +#' @inheritParams prep_priors +#' +#' @returns A "curve_params_priors_stan" object +#' (a subclass of [list] with the inputs to `prep_priors_stan()` attached +#' as [attributes] entry named `"used_priors"`), containing Stan-formatted +#' priors. +#' @export +#' @example inst/examples/examples-prep_priors_stan.R + +prep_priors_stan <- function( + max_antigens, + mu_hyp_param = c(1.0, 7.0, 1.0, -4.0, -1.0), + prec_hyp_param = c(1.0, 0.00001, 1.0, 0.001, 1.0), + omega_param = c(1.0, 50.0, 1.0, 10.0, 1.0), + wishdf_param = 20, + prec_logy_hyp_param = c(4.0, 1.0)) { + + # Input validation (same as prep_priors) + if (length(mu_hyp_param) != 5) { + cli::cli_abort("Need to specify 5 priors for {.arg mu_hyp_param}") + } + if (length(prec_hyp_param) != 5) { + cli::cli_abort("Need to specify 5 priors for {.arg prec_hyp_param}") + } + if (length(omega_param) != 5) { + cli::cli_abort("Need to specify 5 priors for {.arg omega_param}") + } + if (length(wishdf_param) != 1) { + cli::cli_abort("Need to specify 1 prior for {.arg wishdf_param}") + } + if (length(prec_logy_hyp_param) != 2) { + cli::cli_abort("Need to specify 2 priors for {.arg prec_logy_hyp_param}") + } + + # Model parameters + n_params <- 5 + mu_hyp <- array(NA, dim = c(max_antigens, n_params)) + prec_hyp <- array(NA, dim = c(max_antigens, n_params, n_params)) + omega <- array(NA, dim = c(max_antigens, n_params, n_params)) + wishdf <- rep(NA, max_antigens) + prec_logy_hyp <- array(NA, dim = c(max_antigens, 2)) + + # Fill parameter arrays (same structure as JAGS) + for (k in 1:max_antigens) { + mu_hyp[k, ] <- mu_hyp_param + prec_hyp[k, , ] <- diag(prec_hyp_param) + omega[k, , ] <- diag(omega_param) + wishdf[k] <- wishdf_param + prec_logy_hyp[k, ] <- prec_logy_hyp_param + } + + # Return results as a list (Stan model will handle conversion) + prepped_priors <- list( + "n_params" = n_params, + "mu_hyp" = mu_hyp, + "prec_hyp" = prec_hyp, + "omega" = omega, + "wishdf" = wishdf, + "prec_logy_hyp" = prec_logy_hyp + ) |> + structure(class = c("curve_params_priors_stan", "list")) + + # Add used priors as attributes + prepped_priors <- prepped_priors |> + structure("used_priors" = list( + mu_hyp_param = mu_hyp_param, + prec_hyp_param = prec_hyp_param, + omega_param = omega_param, + wishdf_param = wishdf_param, + prec_logy_hyp_param = prec_logy_hyp_param + )) + + return(prepped_priors) +} diff --git a/R/run_mod_stan.R b/R/run_mod_stan.R new file mode 100644 index 00000000..0ce56736 --- /dev/null +++ b/R/run_mod_stan.R @@ -0,0 +1,205 @@ +#' @title Run Stan Model +#' @author Sam Schildhauer, GitHub Copilot +#' @description +#' `run_mod_stan()` takes a data frame and adjustable MCMC inputs to fit a +#' Bayesian model using Stan (via cmdstanr) to estimate antibody dynamic +#' curve parameters. The model estimates seroresponse dynamics to an +#' infection. The antibody dynamic curve includes the following parameters: +#' - y0 = baseline antibody concentration +#' - y1 = peak antibody concentration +#' - t1 = time to peak +#' - shape = shape parameter +#' - alpha = decay rate +#' @param data A [base::data.frame()] with the required columns (see details). +#' @param file_mod The name of the file that contains model structure +#' (a .stan file). +#' @param nchain An [integer] between 1 and 4 that specifies +#' the number of MCMC chains to be run per Stan model. +#' @param nadapt An [integer] specifying the number of warmup/adaptation +#' iterations per chain (Stan equivalent of JAGS adapt + burnin). +#' @param niter An [integer] specifying the number of post-warmup iterations. +#' @param strat A [character] string specifying the stratification variable, +#' entered in quotes. +#' @param with_post A [logical] value specifying whether a raw `stan_fit` +#' component should be included as an element of the [list] object returned +#' by `run_mod_stan()` (see `Value` section below for details). +#' Note: These objects can be large. +#' @param ... Additional arguments passed to `prep_priors_stan()`. +#' @returns An `sr_model` class object: a subclass of [dplyr::tbl_df] that +#' contains MCMC samples from the joint posterior distribution of the model +#' parameters, conditional on the provided input `data`, +#' including the same structure as `run_mod()`. +#' @inheritDotParams prep_priors_stan +#' @export +#' @example inst/examples/run_mod_stan-examples.R +run_mod_stan <- function(data, + file_mod = serodynamics_example("model.stan"), + nchain = 4, + nadapt = 1000, + niter = 1000, + strat = NA, + with_post = FALSE, + ...) { + + # Check if cmdstanr is available + if (!requireNamespace("cmdstanr", quietly = TRUE)) { + cli::cli_abort( + c( + "Package {.pkg cmdstanr} is required but not installed.", + "i" = paste0( + "Install it with: ", + "{.code install.packages('cmdstanr', ", + "repos = c('https://stan-dev.r-universe.dev', ", + "getOption('repos')))}" + ) + ) + ) + } + + ## Conditionally creating a stratification list to loop through + if (is.na(strat)) { + strat_list <- "None" + } else { + strat_list <- unique(data[[strat]]) + } + + ## Creating a shell to output results + stan_out <- data.frame( + "Iteration" = NA, + "Chain" = NA, + "Parameter" = NA, + "value" = NA, + "Parameter_sub" = NA, + "Subject" = NA, + "Iso_type" = NA, + "Stratification" = NA + ) + + ## Creating output list for stan fit objects + stan_fit_final <- list() + + # For loop for running stratifications + for (i in strat_list) { + # Creating if else statement for running the loop + if (is.na(strat)) { + dl_sub <- data + } else { + dl_sub <- data |> + dplyr::filter(.data[[strat]] == i) + } + + # prepare data for modeling + longdata <- prep_data_stan(dl_sub) + priorspec <- prep_priors_stan(max_antigens = longdata$n_antigen_isos, ...) + + # Combine data and priors for Stan + stan_data <- c(longdata, priorspec) + + # Compile and fit the Stan model + mod <- cmdstanr::cmdstan_model(file_mod) + + stan_fit <- mod$sample( + data = stan_data, + chains = nchain, + parallel_chains = nchain, + iter_warmup = nadapt, + iter_sampling = niter, + refresh = 0, # Suppress iteration messages + show_messages = FALSE + ) + + # Store raw Stan fit if requested + stan_fit_final[[i]] <- stan_fit + + # Extract samples and convert to ggmcmc format + draws <- stan_fit$draws( + variables = c("y0", "y1", "t1", "alpha", "shape"), + format = "draws_array" + ) + + # Convert to mcmc.list format compatible with ggmcmc + mcmc_list <- list() + for (ch in 1:nchain) { + mcmc_list[[ch]] <- coda::as.mcmc(draws[ch, , ]) + } + mcmc_list <- coda::as.mcmc.list(mcmc_list) + + # Use ggmcmc to process + stan_unpack <- ggmcmc::ggs(mcmc_list) + + # Adding attributes + mod_atts <- attributes(stan_unpack) + # Only keeping necessary attributes + mod_atts <- mod_atts[4:8] + + # extracting antigen-iso combinations + iso_dat <- data.frame(attributes(longdata)$antigens) + iso_dat <- iso_dat |> dplyr::mutate(Subnum = as.numeric(row.names(iso_dat))) + + # Working with stan unpacked ggs outputs to clarify parameter and subject + stan_unpack <- stan_unpack |> + dplyr::mutate( + Subnum = sub(".*,", "", .data$Parameter), + Parameter_sub = sub("\\[.*", "", .data$Parameter), + Subject = sub("\\,.*", "", .data$Parameter) + ) |> + dplyr::mutate( + Subnum = as.numeric(sub("\\].*", "", .data$Subnum)), + Subject = sub(".*\\[", "", .data$Subject) + ) + + # Merging isodat + stan_unpack <- dplyr::left_join(stan_unpack, iso_dat, by = "Subnum") + ids <- data.frame(attr(longdata, "ids")) |> + mutate(Subject = as.character(dplyr::row_number())) + stan_unpack <- dplyr::left_join(stan_unpack, ids, by = "Subject") + + stan_final <- stan_unpack |> + dplyr::select(!c("Subnum", "Subject")) |> + dplyr::rename( + c("Iso_type" = "attributes.longdata..antigens", + "Subject" = "attr.longdata...ids..") + ) + + # Creating a label for the stratification + stan_final$Stratification <- i + + ## Creating output + stan_out <- data.frame(rbind(stan_out, stan_final)) + } + + # Ensuring output does not have any NAs + stan_out <- stan_out[complete.cases(stan_out), ] + + # Making output a tibble and restructuring + stan_out <- dplyr::as_tibble(stan_out) |> + select(!c("Parameter")) |> + rename("Parameter" = "Parameter_sub") + stan_out <- stan_out[, c("Iteration", "Chain", "Parameter", "Iso_type", + "Stratification", "Subject", "value")] + + current_atts <- attributes(stan_out) + current_atts <- c(current_atts, mod_atts) + attributes(stan_out) <- current_atts + + # Adding priors + stan_out <- stan_out |> + structure("priors" = attributes(priorspec)$used_priors) + + # Calculating fitted and residuals + fit_res <- calc_fit_mod(modeled_dat = stan_out, + original_data = dl_sub) + stan_out <- stan_out |> + structure(fitted_residuals = fit_res) + + # Conditionally adding stan fit + if (with_post) { + stan_out <- stan_out |> + structure(stan.fit = stan_fit_final) + } + + stan_out <- stan_out |> + structure(class = union("sr_model", class(stan_out))) + + stan_out +} diff --git a/README.Rmd b/README.Rmd index 58d2a8de..923055b2 100644 --- a/README.Rmd +++ b/README.Rmd @@ -25,6 +25,12 @@ knitr::opts_chunk$set( The goal of `{serodynamics}` is to implement methods for modeling longitudinal antibody responses to infection. +The package provides Bayesian MCMC modeling capabilities using either: +- **JAGS** (Just Another Gibbs Sampler) via `runjags` - the original implementation +- **Stan** via `cmdstanr` - a modern, efficient alternative (optional) + +Both interfaces use the same data preparation and analysis workflow, allowing users to choose their preferred Bayesian modeling framework. + ## Installation You can install the development version of `{serodynamics}` from [GitHub](https://github.com/) with: @@ -33,3 +39,19 @@ You can install the development version of `{serodynamics}` from [GitHub](https: # install.packages("pak") pak::pak("UCD-SERG/serodynamics") ``` + +### Stan Support (Optional) + +To use Stan models (via `run_mod_stan()`), you'll also need to install `cmdstanr`: + +``` r +# Install cmdstanr from r-universe +install.packages("cmdstanr", + repos = c("https://stan-dev.r-universe.dev", + getOption("repos"))) + +# Then install CmdStan +cmdstanr::install_cmdstan() +``` + +See the [cmdstanr documentation](https://mc-stan.org/cmdstanr/) for more details. diff --git a/README.md b/README.md index 5182a51f..d37fb842 100644 --- a/README.md +++ b/README.md @@ -1,29 +1,50 @@ +--- +output: github_document +--- + + # `{serodynamics}` - -[![Codecov test -coverage](https://codecov.io/gh/UCD-SERG/serodynamics/graph/badge.svg)](https://app.codecov.io/gh/UCD-SERG/serodynamics) +[![Codecov test coverage](https://codecov.io/gh/UCD-SERG/serodynamics/graph/badge.svg)](https://app.codecov.io/gh/UCD-SERG/serodynamics) [![CodeFactor](https://www.codefactor.io/repository/github/ucd-serg/serodynamics/badge)](https://www.codefactor.io/repository/github/ucd-serg/serodynamics) [![R-CMD-check](https://github.com/UCD-SERG/dcm/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/UCD-SERG/dcm/actions/workflows/R-CMD-check.yaml) -[![CRAN -status](https://www.r-pkg.org/badges/version/serodynamics)](https://CRAN.R-project.org/package=serodynamics) -[![Lifecycle: -experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) +[![CRAN status](https://www.r-pkg.org/badges/version/serodynamics)](https://CRAN.R-project.org/package=serodynamics) +[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) -The goal of `{serodynamics}` is to implement methods for modeling -longitudinal antibody responses to infection. +The goal of `{serodynamics}` is to implement methods for modeling longitudinal antibody responses to infection. + +The package provides Bayesian MCMC modeling capabilities using either: +- **JAGS** (Just Another Gibbs Sampler) via `runjags` - the original implementation +- **Stan** via `cmdstanr` - a modern, efficient alternative (optional) + +Both interfaces use the same data preparation and analysis workflow, allowing users to choose their preferred Bayesian modeling framework. ## Installation -You can install the development version of `{serodynamics}` from -[GitHub](https://github.com/) with: +You can install the development version of `{serodynamics}` from [GitHub](https://github.com/) with: ``` r # install.packages("pak") pak::pak("UCD-SERG/serodynamics") ``` + +### Stan Support (Optional) + +To use Stan models (via `run_mod_stan()`), you'll also need to install `cmdstanr`: + +``` r +# Install cmdstanr from r-universe +install.packages("cmdstanr", + repos = c("https://stan-dev.r-universe.dev", + getOption("repos"))) + +# Then install CmdStan +cmdstanr::install_cmdstan() +``` + +See the [cmdstanr documentation](https://mc-stan.org/cmdstanr/) for more details. diff --git a/inst/WORDLIST b/inst/WORDLIST index 8ea3768d..47c32908 100644 --- a/inst/WORDLIST +++ b/inst/WORDLIST @@ -18,6 +18,8 @@ Wasserstein behaviour biomarker biomarkers +burnin +cmdstanr colour darwin dev @@ -48,6 +50,8 @@ rhat seroconversion seroresponse seroresponses +stan +stanfit strat stratifications tbl diff --git a/inst/examples/examples-prep_data_stan.R b/inst/examples/examples-prep_data_stan.R new file mode 100644 index 00000000..9804ace8 --- /dev/null +++ b/inst/examples/examples-prep_data_stan.R @@ -0,0 +1,5 @@ +set.seed(1) +raw_data <- + serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 5) +prepped_data <- prep_data_stan(raw_data) diff --git a/inst/examples/examples-prep_priors_stan.R b/inst/examples/examples-prep_priors_stan.R new file mode 100644 index 00000000..6cf9ea6a --- /dev/null +++ b/inst/examples/examples-prep_priors_stan.R @@ -0,0 +1,9 @@ + +prep_priors_stan(max_antigens = 2, + mu_hyp_param = c(1.0, 7.0, 1.0, -4.0, -1.0), + prec_hyp_param = c(1.0, 0.00001, 1.0, 0.001, 1.0), + omega_param = c(1.0, 50.0, 1.0, 10.0, 1.0), + wishdf_param = 20, + prec_logy_hyp_param = c(4.0, 1.0)) + +prep_priors_stan(max_antigens = 2) diff --git a/inst/examples/run_mod_stan-examples.R b/inst/examples/run_mod_stan-examples.R new file mode 100644 index 00000000..f698e096 --- /dev/null +++ b/inst/examples/run_mod_stan-examples.R @@ -0,0 +1,24 @@ +\dontrun{ + # This example requires cmdstanr and CmdStan to be installed + # See ?run_mod_stan for installation instructions + + library(dplyr) + set.seed(1) + strat1 <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 100) |> + mutate(strat = "stratum 2") + strat2 <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 100) |> + mutate(strat = "stratum 1") + + dataset <- bind_rows(strat1, strat2) + + fitted_model <- run_mod_stan( + data = dataset, # The data set input + file_mod = serodynamics_example("model.stan"), + nchain = 4, # Number of mcmc chains to run + nadapt = 500, # Number of warmup iterations + niter = 1000, # Number of sampling iterations + strat = "strat" # Variable to be stratified + ) +} diff --git a/inst/extdata/model.dobson.stan b/inst/extdata/model.dobson.stan new file mode 100644 index 00000000..b6d8e922 --- /dev/null +++ b/inst/extdata/model.dobson.stan @@ -0,0 +1,19 @@ +// Simple Bernoulli model (Dobson example) +// Translated from model.dobson.jags + +data { + int N; // number of observations + array[N] int r; // binary outcomes +} + +parameters { + real p; // probability parameter +} + +model { + // Prior: Beta(1, 1) = Uniform(0, 1) + p ~ beta(1, 1); + + // Likelihood: Bernoulli trials + r ~ bernoulli(p); +} diff --git a/inst/extdata/model.stan b/inst/extdata/model.stan new file mode 100644 index 00000000..d9ec81f1 --- /dev/null +++ b/inst/extdata/model.stan @@ -0,0 +1,117 @@ +// Stan model for antibody dynamics +// Translated from JAGS model.jags +// Models longitudinal antibody responses with hierarchical priors + +data { + int nsubj; // number of subjects + int n_antigen_isos; // number of biomarkers + int n_params; // number of parameters (5) + array[nsubj] int nsmpl; // number of samples per subject + int max_nsmpl; // maximum number of samples + array[nsubj, max_nsmpl] real smpl_t; // sample times + array[nsubj, max_nsmpl, n_antigen_isos] real logy; // log antibody measurements + + // Hyperpriors + array[n_antigen_isos] vector[n_params] mu_hyp; // hyperprior means + array[n_antigen_isos] matrix[n_params, n_params] prec_hyp; // hyperprior precision + array[n_antigen_isos] matrix[n_params, n_params] omega; // Wishart scale matrix + array[n_antigen_isos] real wishdf; // Wishart degrees of freedom + array[n_antigen_isos, 2] real prec_logy_hyp; // gamma hyperpriors for precision +} + +transformed data { + // Convert precision matrices to covariance matrices for Stan + array[n_antigen_isos] matrix[n_params, n_params] sigma_hyp; + array[n_antigen_isos] matrix[n_params, n_params] omega_inv; + + for (k in 1:n_antigen_isos) { + sigma_hyp[k] = inverse(prec_hyp[k]); + omega_inv[k] = inverse(omega[k]); + } +} + +parameters { + // Subject-level parameters (transformed scale) + array[nsubj, n_antigen_isos] vector[n_params] par; + + // Population-level parameters + array[n_antigen_isos] vector[n_params] mu_par; + + // Covariance matrices (Stan uses covariance, not precision) + array[n_antigen_isos] cov_matrix[n_params] Sigma_par; + + // Observation precision (measurement error) + array[n_antigen_isos] real prec_logy; +} + +transformed parameters { + // Subject-level parameters on natural scale + array[nsubj, n_antigen_isos] real y0; + array[nsubj, n_antigen_isos] real y1; + array[nsubj, n_antigen_isos] real t1; + array[nsubj, n_antigen_isos] real alpha; + array[nsubj, n_antigen_isos] real shape; + + // Growth rate parameter + array[nsubj, n_antigen_isos] real beta; + + // Transform parameters + for (subj in 1:nsubj) { + for (k in 1:n_antigen_isos) { + y0[subj, k] = exp(par[subj, k][1]); + y1[subj, k] = y0[subj, k] + exp(par[subj, k][2]); // par[,,2] is log(y1-y0) + t1[subj, k] = exp(par[subj, k][3]); + alpha[subj, k] = exp(par[subj, k][4]); + shape[subj, k] = exp(par[subj, k][5]) + 1; + + // Compute growth rate + beta[subj, k] = log(y1[subj, k] / y0[subj, k]) / t1[subj, k]; + } + } +} + +model { + // Hyperpriors + for (k in 1:n_antigen_isos) { + mu_par[k] ~ multi_normal(mu_hyp[k], sigma_hyp[k]); + Sigma_par[k] ~ inv_wishart(wishdf[k], omega_inv[k]); + prec_logy[k] ~ gamma(prec_logy_hyp[k, 1], prec_logy_hyp[k, 2]); + } + + // Subject-level priors + for (subj in 1:nsubj) { + for (k in 1:n_antigen_isos) { + par[subj, k] ~ multi_normal(mu_par[k], Sigma_par[k]); + } + } + + // Likelihood + for (subj in 1:nsubj) { + for (obs in 1:nsmpl[subj]) { + for (k in 1:n_antigen_isos) { + real mu_logy; + real sigma_logy; + + sigma_logy = 1.0 / sqrt(prec_logy[k]); + + // Determine phase: active infection or recovery + if (smpl_t[subj, obs] <= t1[subj, k]) { + // Active infection period (t <= t1) + mu_logy = log(y0[subj, k]) + beta[subj, k] * smpl_t[subj, obs]; + } else { + // Recovery period (t > t1) + real shape_term = shape[subj, k]; + real y1_val = y1[subj, k]; + real t_diff = smpl_t[subj, obs] - t1[subj, k]; + real alpha_val = alpha[subj, k]; + + mu_logy = (1.0 / (1.0 - shape_term)) * + log(y1_val^(1.0 - shape_term) - + (1.0 - shape_term) * alpha_val * t_diff); + } + + logy[subj, obs, k] ~ normal(mu_logy, sigma_logy); + } + } + } +} diff --git a/man/prep_data_stan.Rd b/man/prep_data_stan.Rd new file mode 100644 index 00000000..5c7e674a --- /dev/null +++ b/man/prep_data_stan.Rd @@ -0,0 +1,36 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/prep_data_stan.R +\name{prep_data_stan} +\alias{prep_data_stan} +\title{Prepare data for Stan} +\usage{ +prep_data_stan( + dataframe, + biomarker_column = get_biomarker_names_var(dataframe), + verbose = FALSE, + add_newperson = TRUE +) +} +\arguments{ +\item{dataframe}{a \link{data.frame} containing case data} + +\item{biomarker_column}{\link{character} string indicating +which column contains antigen-isotype names} + +\item{verbose}{whether to produce verbose messaging} + +\item{add_newperson}{whether to add an extra record with missing data} +} +\value{ +a \code{prepped_stan_data} object (a \link{list} with Stan-formatted data) +} +\description{ +Prepare data for Stan +} +\examples{ +set.seed(1) +raw_data <- + serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 5) +prepped_data <- prep_data_stan(raw_data) +} diff --git a/man/prep_priors_stan.Rd b/man/prep_priors_stan.Rd new file mode 100644 index 00000000..12a42630 --- /dev/null +++ b/man/prep_priors_stan.Rd @@ -0,0 +1,89 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/prep_priors_stan.R +\name{prep_priors_stan} +\alias{prep_priors_stan} +\title{Prepare priors for Stan} +\usage{ +prep_priors_stan( + max_antigens, + mu_hyp_param = c(1, 7, 1, -4, -1), + prec_hyp_param = c(1, 1e-05, 1, 0.001, 1), + omega_param = c(1, 50, 1, 10, 1), + wishdf_param = 20, + prec_logy_hyp_param = c(4, 1) +) +} +\arguments{ +\item{max_antigens}{An \link{integer} specifying how many +antigen-isotypes (biomarkers) will be modeled.} + +\item{mu_hyp_param}{A \link{numeric} \link{vector} of 5 values representing the prior +mean for the population level parameters +parameters (y0, y1, t1, r, alpha) for each biomarker. +If specified, must be 5 values long, representing the following parameters: +\itemize{ +\item y0 = baseline antibody concentration (default = 1.0) +\item y1 = peak antibody concentration (default = 7.0) +\item t1 = time to peak (default = 1.0) +\item r = shape parameter (default = -4.0) +\item alpha = decay rate (default = -1.0) +}} + +\item{prec_hyp_param}{A \link{numeric} \link{vector} of 5 values corresponding to +hyperprior diagonal entries for the precision matrix (i.e. inverse variance) +representing prior covariance of uncertainty around \code{mu_hyp_param}. +If specified, must be 5 values long: +\itemize{ +\item defaults: y0 = 1.0, y1 = 0.00001, t1 = 1.0, r = 0.001, alpha = 1.0 +}} + +\item{omega_param}{A \link{numeric} \link{vector} of 5 values corresponding to the +diagonal entries representing the Wishart hyperprior +distributions of \code{prec_hyp_param}, describing how much we expect parameters +to vary between individuals. +If specified, must be 5 values long: +\itemize{ +\item defaults: y0 = 1.0, y1 = 50.0, t1 = 1.0, r = 10.0, alpha = 1.0 +}} + +\item{wishdf_param}{An \link{integer} \link{vector} of 1 value specifying the degrees +of freedom for the Wishart hyperprior distribution of \code{prec_hyp_param}. +If specified, must be 1 value long. +\itemize{ +\item default = 20.0 +\item The value of \code{wishdf_param} controls how informative the Wishart prior +is. Higher values lead to tighter priors on individual variation. +Lower values (e.g., 5–10) make the prior more weakly informative, +which can help improve convergence if the model is over-regularized. +}} + +\item{prec_logy_hyp_param}{A \link{numeric} \link{vector} of 2 values corresponding to +hyperprior diagonal entries on the log-scale for the precision matrix +(i.e. inverse variance) representing prior beliefs of individual variation. +If specified, must be 2 values long: +\itemize{ +\item defaults = 4.0, 1.0 +}} +} +\value{ +A "curve_params_priors_stan" object +(a subclass of \link{list} with the inputs to \code{prep_priors_stan()} attached +as \link{attributes} entry named \code{"used_priors"}), containing Stan-formatted +priors. +} +\description{ +Takes multiple \link{vector} inputs to allow for modifiable priors for Stan +models. Converts JAGS precision-based priors to Stan covariance-based +priors. +} +\examples{ + +prep_priors_stan(max_antigens = 2, + mu_hyp_param = c(1.0, 7.0, 1.0, -4.0, -1.0), + prec_hyp_param = c(1.0, 0.00001, 1.0, 0.001, 1.0), + omega_param = c(1.0, 50.0, 1.0, 10.0, 1.0), + wishdf_param = 20, + prec_logy_hyp_param = c(4.0, 1.0)) + +prep_priors_stan(max_antigens = 2) +} diff --git a/man/run_mod_stan.Rd b/man/run_mod_stan.Rd new file mode 100644 index 00000000..526dd116 --- /dev/null +++ b/man/run_mod_stan.Rd @@ -0,0 +1,137 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/run_mod_stan.R +\name{run_mod_stan} +\alias{run_mod_stan} +\title{Run Stan Model} +\usage{ +run_mod_stan( + data, + file_mod = serodynamics_example("model.stan"), + nchain = 4, + nadapt = 1000, + niter = 1000, + strat = NA, + with_post = FALSE, + ... +) +} +\arguments{ +\item{data}{A \code{\link[base:data.frame]{base::data.frame()}} with the required columns (see details).} + +\item{file_mod}{The name of the file that contains model structure +(a .stan file).} + +\item{nchain}{An \link{integer} between 1 and 4 that specifies +the number of MCMC chains to be run per Stan model.} + +\item{nadapt}{An \link{integer} specifying the number of warmup/adaptation +iterations per chain (Stan equivalent of JAGS adapt + burnin).} + +\item{niter}{An \link{integer} specifying the number of post-warmup iterations.} + +\item{strat}{A \link{character} string specifying the stratification variable, +entered in quotes.} + +\item{with_post}{A \link{logical} value specifying whether a raw \code{stan_fit} +component should be included as an element of the \link{list} object returned +by \code{run_mod_stan()} (see \code{Value} section below for details). +Note: These objects can be large.} + +\item{...}{ + Arguments passed on to \code{\link[=prep_priors_stan]{prep_priors_stan}} + \describe{ + \item{\code{max_antigens}}{An \link{integer} specifying how many +antigen-isotypes (biomarkers) will be modeled.} + \item{\code{mu_hyp_param}}{A \link{numeric} \link{vector} of 5 values representing the prior +mean for the population level parameters +parameters (y0, y1, t1, r, alpha) for each biomarker. +If specified, must be 5 values long, representing the following parameters: +\itemize{ +\item y0 = baseline antibody concentration (default = 1.0) +\item y1 = peak antibody concentration (default = 7.0) +\item t1 = time to peak (default = 1.0) +\item r = shape parameter (default = -4.0) +\item alpha = decay rate (default = -1.0) +}} + \item{\code{prec_hyp_param}}{A \link{numeric} \link{vector} of 5 values corresponding to +hyperprior diagonal entries for the precision matrix (i.e. inverse variance) +representing prior covariance of uncertainty around \code{mu_hyp_param}. +If specified, must be 5 values long: +\itemize{ +\item defaults: y0 = 1.0, y1 = 0.00001, t1 = 1.0, r = 0.001, alpha = 1.0 +}} + \item{\code{omega_param}}{A \link{numeric} \link{vector} of 5 values corresponding to the +diagonal entries representing the Wishart hyperprior +distributions of \code{prec_hyp_param}, describing how much we expect parameters +to vary between individuals. +If specified, must be 5 values long: +\itemize{ +\item defaults: y0 = 1.0, y1 = 50.0, t1 = 1.0, r = 10.0, alpha = 1.0 +}} + \item{\code{wishdf_param}}{An \link{integer} \link{vector} of 1 value specifying the degrees +of freedom for the Wishart hyperprior distribution of \code{prec_hyp_param}. +If specified, must be 1 value long. +\itemize{ +\item default = 20.0 +\item The value of \code{wishdf_param} controls how informative the Wishart prior +is. Higher values lead to tighter priors on individual variation. +Lower values (e.g., 5–10) make the prior more weakly informative, +which can help improve convergence if the model is over-regularized. +}} + \item{\code{prec_logy_hyp_param}}{A \link{numeric} \link{vector} of 2 values corresponding to +hyperprior diagonal entries on the log-scale for the precision matrix +(i.e. inverse variance) representing prior beliefs of individual variation. +If specified, must be 2 values long: +\itemize{ +\item defaults = 4.0, 1.0 +}} + }} +} +\value{ +An \code{sr_model} class object: a subclass of \link[dplyr:tbl_df]{dplyr::tbl_df} that +contains MCMC samples from the joint posterior distribution of the model +parameters, conditional on the provided input \code{data}, +including the same structure as \code{run_mod()}. +} +\description{ +\code{run_mod_stan()} takes a data frame and adjustable MCMC inputs to fit a +Bayesian model using Stan (via cmdstanr) to estimate antibody dynamic +curve parameters. The model estimates seroresponse dynamics to an +infection. The antibody dynamic curve includes the following parameters: +\itemize{ +\item y0 = baseline antibody concentration +\item y1 = peak antibody concentration +\item t1 = time to peak +\item shape = shape parameter +\item alpha = decay rate +} +} +\examples{ +\dontrun{ + # This example requires cmdstanr and CmdStan to be installed + # See ?run_mod_stan for installation instructions + + library(dplyr) + set.seed(1) + strat1 <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 100) |> + mutate(strat = "stratum 2") + strat2 <- serocalculator::typhoid_curves_nostrat_100 |> + sim_case_data(n = 100) |> + mutate(strat = "stratum 1") + + dataset <- bind_rows(strat1, strat2) + + fitted_model <- run_mod_stan( + data = dataset, # The data set input + file_mod = serodynamics_example("model.stan"), + nchain = 4, # Number of mcmc chains to run + nadapt = 500, # Number of warmup iterations + niter = 1000, # Number of sampling iterations + strat = "strat" # Variable to be stratified + ) +} +} +\author{ +Sam Schildhauer, GitHub Copilot +}