Hubify

Convert object detection datasets to HuggingFace format and upload to the Hub.

Currently supported formats:

COCO format annotations
YOLO format annotations
YOLO OBB format annotations

Coming soon: Pascal VOC, Labelme, and more!

Motivations for this tool

HuggingFace has become the defacto open source community to upload datasets and models. It's primarily about LLMs and language models, but there's nothing about HuggingFace's dataset hosting that's specific to language modeling.

This tool is meant to be a way to consolidate the different formats from the object detection domain (COCO, Pascal VOC, etc) into what HuggingFace suggests for their Image Datasets, and upload it to HuggingFace Hub.

Installation

pip install hubify-dataset

Usage

After installation, you can use the hubify command:

# Auto-detect annotations in train/validation/test directories
hubify --data-dir /path/to/images --format coco

# Manually specify annotation files
hubify --data-dir /path/to/images \
  --train-annotations /path/to/instances_train2017.json \
  --validation-annotations /path/to/instances_val2017.json

# Generate sample visualizations
hubify --data-dir /path/to/images --visualize

# Push to HuggingFace Hub
hubify --data-dir /path/to/images \
  --train-annotations /path/to/instances_train2017.json \
  --push-to-hub username/my-dataset

# for yolo
hubify --data-dir ~/Downloads/DOTAv1.5 --format yolo-obb  --push-to-hub benjamintli/dota-v1.5

hubify --data-dir ~/Downloads/DOTAv1.5 --format yolo  --push-to-hub benjamintli/dota-v1.5

Or run directly with Python (from the virtual environment):

source .venv/bin/activate
python -m src.main --data-dir /path/to/images

Expected Directory Structure

For coco:

data-dir/
├── train/
│   ├── instances*.json  (auto-detected)
│   └── *.jpg            (images)
├── validation/
│   ├── instances*.json  (auto-detected)
│   └── *.jpg            (images)
└── test/               (optional)
    ├── instances*.json
    └── *.jpg

Output

The tool generates metadata.jsonl files in each split directory:

data-dir/
├── train/
│   └── metadata.jsonl
└── validation/
    └── metadata.jsonl

Each line in metadata.jsonl contains:

{
  "file_name": "image.jpg",
  "objects": {
    "bbox": [[x, y, width, height], ...],
    "category": [0, 1, ...]
  }
}

Options

--data-dir: Root directory containing train/validation/test subdirectories (required)
--format: Dataset format: 'auto' (default), 'coco', 'yolo', or 'yolo-obb' (optional)
--train-annotations: Path to training annotations JSON (optional)
--validation-annotations: Path to validation annotations JSON (optional)
--test-annotations: Path to test annotations JSON (optional)
--visualize: Generate sample visualization images with bounding boxes
--push-to-hub: Push dataset to HuggingFace Hub (format: username/dataset-name)
--token: HuggingFace API token (optional, defaults to HF_TOKEN env var or huggingface-cli login)

Authentication for Hub Push

When using --push-to-hub, the tool looks for your HuggingFace token in this order:

--token YOUR_TOKEN (CLI argument)
HF_TOKEN environment variable
Token from huggingface-cli login

If no token is found, you'll get a helpful error message with instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hubify

Motivations for this tool

Installation

Usage

Expected Directory Structure

Output

Options

Authentication for Hub Push

About

Uh oh!

Releases 4

Packages

Languages

License

benjamintli/hubify

Folders and files

Latest commit

History

Repository files navigation

Hubify

Motivations for this tool

Installation

Usage

Expected Directory Structure

Output

Options

Authentication for Hub Push

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages