Skip to content

change in ImageNet format after being hosted on kaggle. #3

@Spandan-Madan

Description

@Spandan-Madan

Hi,

I'm extremely excited about support for large scale/fast I/O in PyTorch. I am trying to run the example and downloaded ImageNet. As you might be aware, ImageNet is no longer available for download from http://www.image-net.org/download and is now hosted at Kaggle. I downloaded the dataset, but it seems there's a change in the format from the previous version and can no longer be loaded with PyTorch's inbuilt Dataset class. This leads to errors in creating shards.

Here's the error I get:-

The archive ILSVRC2012_devkit_t12.tar.gz is not present in the root directory or is corrupted. You need to download it externally and place it in ./data

The structure of the downloaded dataset contains:-

.
├── Annotations
│   └── CLS-LOC
│       ├── train
│       └── val
├── Data
│   └── CLS-LOC
│       ├── test
│       ├── train
│       └── val
└── ImageSets
    └── CLS-LOC
        ├── test.txt
        ├── train_cls.txt
        ├── train_loc.txt
        └── val.txt

Can we come up with a work-around which works out of the box with the current distribution of ImageNet? The original PyTorch ImageNet example works with it as we only need the image files. I think the error originates from the parsing of metadata while making shards, so a workaround should be possible I think. Happy to help with this.

Best,
Spandan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions