-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Hi,
I'm extremely excited about support for large scale/fast I/O in PyTorch. I am trying to run the example and downloaded ImageNet. As you might be aware, ImageNet is no longer available for download from http://www.image-net.org/download and is now hosted at Kaggle. I downloaded the dataset, but it seems there's a change in the format from the previous version and can no longer be loaded with PyTorch's inbuilt Dataset class. This leads to errors in creating shards.
Here's the error I get:-
The archive ILSVRC2012_devkit_t12.tar.gz is not present in the root directory or is corrupted. You need to download it externally and place it in ./data
The structure of the downloaded dataset contains:-
.
├── Annotations
│ └── CLS-LOC
│ ├── train
│ └── val
├── Data
│ └── CLS-LOC
│ ├── test
│ ├── train
│ └── val
└── ImageSets
└── CLS-LOC
├── test.txt
├── train_cls.txt
├── train_loc.txt
└── val.txt
Can we come up with a work-around which works out of the box with the current distribution of ImageNet? The original PyTorch ImageNet example works with it as we only need the image files. I think the error originates from the parsing of metadata while making shards, so a workaround should be possible I think. Happy to help with this.
Best,
Spandan