-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Background, dataloader slows down over time, especially when using a large number of slides; data that is persistent in memory loads quickly (case for very small number sslides), but not when training from large number of slides; issues with having .compute() within getitem(), yet needing to take into account data augmentations (albumentations) for the mask of the image for semantic segmentation task when loading data, which can make the dataloading operation if more daskified a bit more complex:
- https://discuss.pytorch.org/t/deadlock-with-dataloader-and-xarray-dask/9387
- Using dask with PyTorch (train a model) dask/distributed#2581
- https://examples.dask.org/machine-learning/torch-prediction.html
- https://github.com/horovod/horovod
Issue is with the getitem, when the data is loaded, it passes quickly through the DL model.
Potentially nice ideas:
- Daskifying the collate function when collecting data
- Chunk size of zarr/dask arrays
- https://github.com/muammar/ml4chem/blob/5bc7808dc0c3ecd650bc52ebc14c2c6fa4e93ef9/ml4chem/atomistic/models/autoencoders.py#L1062
- https://github.com/muammar/ml4chem/blob/d2dec155f53aedada4b106f2173cf315a8b95b2b/ml4chem/atomistic/models/neuralnetwork.py#L662
- Will add more here to this issue.
@lvaickus , can you comment more here?
Metadata
Metadata
Assignees
Labels
No labels