Skip to content

DA results change when varying d by one dimension on the same integrated object #379

@lcoletto

Description

@lcoletto

Hi MiloR team,

I am observing a substantial change in differential abundance (DA) results (see attached) when varying the number of dimensions (d) by just one unit, while keeping the input object and the full pipeline identical. I would like to confirm whether this level of sensitivity is expected.

Dataset and preprocessing
Starting object: 116,014 cells
Integration performed with STACAS
Analyses run on the integrated assay
PCA and UMAP already computed in the Seurat object

Groups compared: at_risk (4 patients) vs Naive (7 patients)
Cell distribution:
at_risk = 2882 cells
Naive = 27130 cells

UMAPs split by condition clearly show differences in cell distributions across regions of the embedding (see attached figures).

Key observation
Using the exact same pipeline but changing only the number of dimensions:
d = 26 → some neighbourhoods pass significance thresholds
d = 27 → no neighbourhoods significant
Neighbourhood counts also change slightly:
26 dims → 4773 neighbourhoods
27 dims → 4720 neighbourhoods

Simplified pipeline
ndim <- 26 # or 27
TS <- RunUMAP(TS1, dims = 1:ndim)
DefaultAssay(TS) <- "integrated"
sce <- as.SingleCellExperiment(TS)
milo <- Milo(sce)
milo <- buildGraph(milo, k = 39, d = ndim)
milo <- makeNhoods(milo, prop = 0.05, k = 39, d = ndim, refined = TRUE)
milo <- countCells(milo, meta.data = data.frame(colData(milo)), sample = "orig.ident")
design <- data.frame(colData(milo))[, c("orig.ident","Condition3","ChemistryV")]design <- distinct(design)
rownames(design) <- design$orig.ident
design <- design[colnames(nhoodCounts(milo)), ]
milo <- calcNhoodDistance(milo, d = ndim)
da_results <- testNhoods(
milo,
design = ~0 + Condition3 + ChemistryV,
design.df = design,
model.contrasts = "Condition3at_risk - Condition3Naive",
fdr.weighting = "graph-overlap"
)

packageVersion("miloR")[1] ‘2.2.0’
packageVersion("SingleCellExperiment")[1] ‘1.28.1’
packageVersion("Seurat")[1] ‘5.1.0’
packageVersion("STACAS")[1] ‘2.2.2’

Is this level of instability expected due to how neighbourhoods are defined in PCA space, especially in large integrated datasets with uneven group sizes?

Do you recommend:

  1. a strategy to choose d more robustly, or
  2. best practices to assess stability of DA results across dimensionalities?

Thanks a lot for your help!
Best wishes,
Lavinia

Image Image Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions