DA results change when varying d by one dimension on the same integrated object

Hi MiloR team,

I am observing a substantial change in differential abundance (DA) results (see attached) when varying the number of dimensions (d) by just one unit, while keeping the input object and the full pipeline identical. I would like to confirm whether this level of sensitivity is expected.

**Dataset and preprocessing**
Starting object: 116,014 cells
Integration performed with STACAS
Analyses run on the integrated assay
PCA and UMAP already computed in the Seurat object

**Groups compared:** at_risk (4 patients) vs Naive (7 patients)
**Cell distribution:**
at_risk = 2882 cells
Naive   = 27130 cells

UMAPs split by condition clearly show differences in cell distributions across regions of the embedding (see attached figures).

**Key observation**
Using the exact same pipeline but changing only the number of dimensions:
d = 26 → some neighbourhoods pass significance thresholds
d = 27 → no neighbourhoods significant
Neighbourhood counts also change slightly:
26 dims → 4773 neighbourhoods
27 dims → 4720 neighbourhoods

**Simplified pipeline**
ndim <- 26  # or 27
TS <- RunUMAP(TS1, dims = 1:ndim)
DefaultAssay(TS) <- "integrated"
sce <- as.SingleCellExperiment(TS)
milo <- Milo(sce)
milo <- buildGraph(milo, k = 39, d = ndim)
milo <- makeNhoods(milo, prop = 0.05, k = 39, d = ndim, refined = TRUE)
milo <- countCells(milo, meta.data = data.frame(colData(milo)), sample = "orig.ident")
design <- data.frame(colData(milo))[, c("orig.ident","Condition3","ChemistryV")]design <- distinct(design)
rownames(design) <- design$orig.ident
design <- design[colnames(nhoodCounts(milo)), ]
milo <- calcNhoodDistance(milo, d = ndim)
da_results <- testNhoods(
  milo,
  design = ~0 + Condition3 + ChemistryV,
  design.df = design,
  model.contrasts = "Condition3at_risk - Condition3Naive",
  fdr.weighting = "graph-overlap"
)

> packageVersion("miloR")[1] ‘2.2.0’
> packageVersion("SingleCellExperiment")[1] ‘1.28.1’
> packageVersion("Seurat")[1] ‘5.1.0’
> packageVersion("STACAS")[1] ‘2.2.2’
 
 
Is this level of instability expected due to how neighbourhoods are defined in PCA space, especially in large integrated datasets with uneven group sizes?

Do you recommend:
1) a strategy to choose d more robustly, or
2) best practices to assess stability of DA results across dimensionalities?

Thanks a lot for your help!
Best wishes, 
Lavinia

<img width="1188" height="377" alt="Image" src="https://github.com/user-attachments/assets/7f9c778a-5c2d-4887-a734-751606eb7f55" />
<img width="1318" height="465" alt="Image" src="https://github.com/user-attachments/assets/dce45a37-9882-44cd-9f7d-2d8a20158d7f" />

<img width="337" height="361" alt="Image" src="https://github.com/user-attachments/assets/d9985c05-99a0-4120-af2a-3e0306e4dd20" />
<img width="366" height="370" alt="Image" src="https://github.com/user-attachments/assets/4ec6736d-bc8f-4df8-b9e8-7da2f6ae05c3" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DA results change when varying d by one dimension on the same integrated object #379

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DA results change when varying d by one dimension on the same integrated object #379

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions