Skip to content

Inconsistent error with tensor mismatch #48

@sanjansen

Description

@sanjansen

Dear Team Decima,

Our names are Sanne Jansen and Orfeas Gkourlias, both student of the Master Data Science for Life Sciences followed at the Hanze University of Applied Sciences in Groningen (Netherlands). Currently, we are working on a project for which we would like to deploy your model, Decima. Specifically for the prediction of the effects of variants (predict_variant_effect), using your Python API.

For this, we have performed runs on single-cell eQTL data. In Decima, we specify the the specific cell type as a task, e.g:
df_variant = pd.read_table(args.input)
predict_variant_effect(df_variant,
output_pq = args.output,
device = device,
tasks = f"cell_type == 'B cell'",
genome = "path/to/hg38/hg38.fa")

To avoid computational overhead, we splitted our eQTL tsvs (df_variant) per chromosome for each celltype. We succesfully performed variant effect predictions for Decima for dendritic cells, CD4 + T-cells and CD8+ T-cells. However, when trying to perform predictions of NK-cells, Monocytes and B-cells, chromosome 19 fails. All the other chromosomes succesfully produced an output.

For these cell types, we find the following error:

RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/path/to/miniconda3/lib/python3.13/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
File "/path/to/miniconda3/lib/python3.13/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/path/to/miniconda3/lib/python3.13/site-packages/decima/data/dataset.py", line 441, in getitem
inputs = torch.vstack([seq, mask])
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 524289 but got size 524288 for tensor number 1 in the list.

We wonder whether this is a known problem, as it seems to be inconsistent across celltypes due to the fact that we did recieve proper output for 3/6 celltypes we tried.
For completeness, these are the tasks we specified to the model for the failing celltypes: 'classical monocyte', 'B cell', 'NK'.

The version of the operating system: Linux Rocky version 9.5
The version of Python: Python 3.13.5.

Of course, it would be ideal if we could solve this specific problem. Thank you for your time in advance, and we hope to hear from you soon.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions