Skip to content

Reincorporate Stan estimation in latent variable case #10

@robertness

Description

@robertness

Some orphaned code...

Resample the training data and corresponding noise inferences to match the 50% PKC on, 50% PKC off as in the proposed experiment.

training <- mutate(training, data = noise$data)
training_pkc_off <- training[training$PKC == 0, ]
training_pkc_on <- training[training$PKC == 1, ]
training_pkc_off_upsampled <- sample_n(training_pkc_off,  nrow(training_pkc_on), replace=TRUE)
training_sim <- rbind(training_pkc_on, training_pkc_off_upsampled)

```{r}
# The Stan model models one row of data only, so must be run iteratively.
# This is slow.  The alternative is to model the data vectors of length N.  
# This would require a technique for enforcing independence between noise terms of each data point in the posterior.
stan_scm_file <- system.file('extdata', 'sachs_scm.stan', package="causaldemon", mustWork=TRUE)
scm_model <- stan_model(file=stan_scm_file)

Technical note: In this data subset, PKC is set by intervention. So for the sake of accuracy, the Stan model doesn't do inference on a noise term for PKC. This is not a huge concern because the proposed experiment will apply an intervention to PKC, and therefore not use that noise term.

noise_terms <- paste0("N_", c("PKC", "PKA", "Raf", "Mek", "Erk"))
get_noise <- function(item){
  noise_posterior <- sampling(
    scm_model,
    data = as.list(item),
    cores = parallel::detectCores(),
    iter = 2000,
    pars = noise_terms,
    verbose=FALSE
  )
  print(i)
  print(summary(noise_posterior)$summary[, 'Rhat'])
  samples <- rstan::extract(noise_posterior, permuted = T)
  log_prob <- samples[['lp__']]
  samples <- map(samples[noise_terms], as.numeric)
  out <- samples[noise_terms] %>%
    as_tibble %>%
    mutate(i = factor(1)) %>% 
    nest(-i) %>%
    select(-i)
  return(out)
}

noise <- tibble()
for(i in 1:nrow(training)){
  item <- training[i, , drop=FALSE]
  noise <- rbind(noise, get_noise(item))
}
save(noise, file=paste0(getwd(), '/scm_noise_inference.Rdata'))

```{r, echo=FALSE}
load(paste0(getwd(), '/scm_noise_inference.Rdata'))

For each observation in the data, the inference algorithm provided samples from the posterior of the noise terms. The following are the noise term samples for the first observation in the data.

str(noise$data[[1]])

scm_sim_result <- training %>%
mutate(data = noise$data) %>% # add the noise samples as a variable to the data
mutate(replicate = 1:n()) %>% # label each row (replicate) with an index
unnest(data) %>% # expand out the samples so each row corresponds to a sample
mutate(Erk_sim = scm_simulation(PKA, N_Raf, N_Erk)) %>% # simulate Erk
group_by(PKA, replicate) %>% # calculate the mean Erk value (Bayes estimate of Erk) for each replicate
summarize(Erk_sim_mean = mean(Erk_sim)) %>%
ungroup %>%
group_by(PKA) %>% # Condition on PKA
summarize(p_Erk_on = mean(Erk_sim_mean)) %>%
mutate(
PKA = ifelse(PKA == 1, 'on', 'off'),
data = "scm_sim"
)
results <- rbind(results, scm_sim_result)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions