-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Hi MiloR team,
I am observing a substantial change in differential abundance (DA) results (see attached) when varying the number of dimensions (d) by just one unit, while keeping the input object and the full pipeline identical. I would like to confirm whether this level of sensitivity is expected.
Dataset and preprocessing
Starting object: 116,014 cells
Integration performed with STACAS
Analyses run on the integrated assay
PCA and UMAP already computed in the Seurat object
Groups compared: at_risk (4 patients) vs Naive (7 patients)
Cell distribution:
at_risk = 2882 cells
Naive = 27130 cells
UMAPs split by condition clearly show differences in cell distributions across regions of the embedding (see attached figures).
Key observation
Using the exact same pipeline but changing only the number of dimensions:
d = 26 → some neighbourhoods pass significance thresholds
d = 27 → no neighbourhoods significant
Neighbourhood counts also change slightly:
26 dims → 4773 neighbourhoods
27 dims → 4720 neighbourhoods
Simplified pipeline
ndim <- 26 # or 27
TS <- RunUMAP(TS1, dims = 1:ndim)
DefaultAssay(TS) <- "integrated"
sce <- as.SingleCellExperiment(TS)
milo <- Milo(sce)
milo <- buildGraph(milo, k = 39, d = ndim)
milo <- makeNhoods(milo, prop = 0.05, k = 39, d = ndim, refined = TRUE)
milo <- countCells(milo, meta.data = data.frame(colData(milo)), sample = "orig.ident")
design <- data.frame(colData(milo))[, c("orig.ident","Condition3","ChemistryV")]design <- distinct(design)
rownames(design) <- design$orig.ident
design <- design[colnames(nhoodCounts(milo)), ]
milo <- calcNhoodDistance(milo, d = ndim)
da_results <- testNhoods(
milo,
design = ~0 + Condition3 + ChemistryV,
design.df = design,
model.contrasts = "Condition3at_risk - Condition3Naive",
fdr.weighting = "graph-overlap"
)
packageVersion("miloR")[1] ‘2.2.0’
packageVersion("SingleCellExperiment")[1] ‘1.28.1’
packageVersion("Seurat")[1] ‘5.1.0’
packageVersion("STACAS")[1] ‘2.2.2’
Is this level of instability expected due to how neighbourhoods are defined in PCA space, especially in large integrated datasets with uneven group sizes?
Do you recommend:
- a strategy to choose d more robustly, or
- best practices to assess stability of DA results across dimensionalities?
Thanks a lot for your help!
Best wishes,
Lavinia
