GitHub - jamjamjon/usls: A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models such as YOLO, FastVLM, and more.

usls

📘 API Documentation | 🌟 Examples | 📦 Model Zoo

usls is a cross-platform Rust library powered by ONNX Runtime for efficient inference of SOTA vision and vision-language models (typically under 1B parameters).

(Generated by Seedream4.5)

🌟 Highlights

⚡ High Performance: Multi-threading, SIMD, and CUDA-accelerated processing
🌐 Cross-Platform: Linux, macOS, Windows with ONNX Runtime execution providers (CUDA, TensorRT, CoreML, OpenVINO, DirectML, etc.)
🏗️ Unified API: Single Model trait inference with run()/forward()/encode_images()/encode_texts() and unified Y output
📥 Auto-Management: Automatic model download (HuggingFace/GitHub), caching and path resolution
📦 Multiple Inputs: Image, directory, video, webcam, stream and combinations
🎯 Precision Support: FP32, FP16, INT8, UINT8, Q4, Q4F16, BNB4, and more
🛠️ Full-Stack Suite: DataLoader, Annotator, and Viewer for complete workflows
🌱 Model Ecosystem: 50+ SOTA vision and VLM models

🚀 Quick Start

Run the YOLO-Series demo to explore models with different tasks, precision and execution providers:

Tasks: detect, segment, pose, classify, obb
Versions: YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLOv12, YOLOv13, YOLO26
Scales: n, s, m, l, x
Precision: fp32, fp16, q8, q4, q4f16, bnb4
Execution Providers: CPU, CUDA, TensorRT, TensorRT-RTX, CoreML, OpenVINO, and more

Examples

# CPU: Object detection with YOLO26n (FP16)
cargo run -r --example yolo -- --task detect --ver 26 --scale n --dtype fp16

# CUDA model + CPU processor: Instance segmentation with YOLO11m
cargo run -r -F cuda --example yolo -- --task segment --ver 11 --scale m --device cuda:0 --processor-device cpu

# CUDA model + CUDA processor: Pose estimation with YOLOv8m
cargo run -r -F cuda-full --example yolo -- --task pose --ver 8 --scale s --device cuda:0 --processor-device cuda:0

# TensorRT model + CPU processor
cargo run -r -F tensorrt --example yolo -- --device tensorrt:0 --processor-device cpu

# TensorRT model + CUDA processor (CUDA 12.4)
cargo run -r -F tensorrt-cuda-12040 --example yolo -- --device tensorrt:0 --processor-device cuda:0

# TensorRT-RTX model + CUDA processor
cargo run -r -F nvrtx-full --example yolo -- --device nvrtx:0 --processor-device cuda:0

# TensorRT-RTX model + CPU processor
cargo run -r -F nvrtx --example yolo -- --device nvrtx:0

# Apple Silicon CoreML
cargo run -r -F coreml --example yolo -- --device coreml

# Intel OpenVINO (CPU/GPU/VPU)
cargo run -r -F openvino -F ort-load-dynamic --example yolo -- --device openvino:CPU

# Show all available options
cargo run -r --example yolo -- --help

See YOLO Examples for more details and use cases.

See Device Combination Guide for feature and device configurations.

Performance

Environment: NVIDIA RTX 3060Ti (TensorRT-10.11.0.33, CUDA 12.8, TensorRT-RTX-1.3.0.35) / Intel i5-12400F

Setup: YOLO26n, COCO2017 validation set (5,000 images), Resolution: 640x640, Conf thresholds: [0.35, 0.3, ..]

Results are for rough reference only.

EP	Image Processor	DType	Batch	Preprocess	Inference	Postprocess	Total
TensorRT	CUDA	FP16	1	~233µs	~1.3ms	~14µs	~1.55ms
TensorRT-RTX	CUDA	FP32	1	~233µs	~2.0ms	~10µs	~2.24ms
TensorRT-RTX	CUDA	FP16	1	❓	❓	❓	❓
CUDA	CUDA	FP32	1	~233µs	~5.0ms	~17µs	~5.25ms
CUDA	CUDA	FP16	1	~233µs	~3.6ms	~17µs	~3.85ms
CUDA	CPU	FP32	1	~800µs	~6.5ms	~14µs	~7.31ms
CUDA	CPU	FP16	1	~800µs	~5.0ms	~14µs	~5.81ms
CPU	CPU	FP32	1	~970µs	~20.5ms	~14µs	~21.48ms
CPU	CPU	FP16	1	~970µs	~25.0ms	~14µs	~25.98ms
TensorRT	CUDA	FP16	8	~1.2ms	~6.0ms	~55µs	~7.26ms
TensorRT	CPU	FP16	8	~18.0ms	~25.5ms	~55µs	~43.56ms

📦 Model Zoo

Status: ✅ Supported | ❓ Unknown | ❌ Not Supported For Now

🔥 YOLO-Series

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
YOLOv5	Image Classification Object Detection Instance Segmentation	demo	✅	✅	✅	✅	✅	❌	❌
YOLOv6	Object Detection	demo	✅	✅	✅	✅	✅	❌	❌
YOLOv7	Object Detection	demo	✅	✅	✅	✅	✅	❌	❌
YOLOv8	Object Detection Instance Segmentation Image Classification Oriented Object Detection Keypoint Detection	demo	✅	✅	✅	✅	✅	❌	❌
YOLO11	Object Detection Instance Segmentation Image Classification Oriented Object Detection Keypoint Detection	demo	✅	✅	✅	✅	✅	❌	❌
YOLOv9	Object Detection	demo	✅	✅	✅	✅	✅	❌	❌
YOLOv10	Object Detection	demo	✅	✅	✅	✅	✅	❌	❌
YOLOv12	Image Classification Object Detection Instance Segmentation	demo	✅	✅	✅	✅	✅	✅	✅
YOLOv13	Object Detection	demo	✅	✅	✅	✅	✅	✅	✅
YOLO26	Object Detection Instance Segmentation Image Classification Oriented Object Detection Keypoint Detection	demo	✅	✅	✅	✅	✅	✅	✅

🏷️ Image Classification & Tagging

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
BEiT	Image Classification	demo	✅	✅	✅	✅	❌	❌	❌
ConvNeXt	Image Classification	demo	✅	✅	✅	✅	❌	❌	❌
FastViT	Image Classification	demo	✅	✅	✅	✅	❌	❌	❌
MobileOne	Image Classification	demo	✅	✅	✅	✅	❌	❌	❌
DeiT	Image Classification	demo	✅	✅	✅	✅	❌	❌	❌
RAM	Image Tagging	demo	✅	❓	✅	✅	✅	✅	✅
RAM++	Image Tagging	demo	✅	❓	✅	✅	✅	✅	✅

🎯 Object Detection

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
RT-DETRv1	Object Detection	demo	✅	✅	✅	✅	✅	✅	✅
RT-DETRv2	Object Detection	demo	✅	✅	✅	✅	✅	✅	✅
RT-DETRv4	Object Detection	demo	✅	✅	✅	✅	✅	✅	✅
RF-DETR	Object Detection	demo	✅	✅	✅	✅	✅	✅	✅
PP-PicoDet	Object Detection	demo	❌	❓	✅	❌	❌	❌	❌
D-FINE	Object Detection	demo	✅	❓	✅	❌	❌	❌	❌
DEIM	Object Detection	demo	✅	❓	✅	❌	❌	❌	❌
DEIMv2	Object Detection	demo	✅	❓	✅	✅	✅	✅	✅

🎨 Image Segmentation

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
SAM	Segment Anything	demo	✅	❓	✅	❌	❌	❌	❌
SAM-HQ	Segment Anything	demo	✅	❓	✅	❌	❌	❌	❌
MobileSAM	Segment Anything	demo	✅	❓	✅	❌	❌	❌	❌
EdgeSAM	Segment Anything	demo	✅	❓	✅	❌	❌	❌	❌
FastSAM	Instance Segmentation	demo	✅	✅	✅	✅	✅	✅	✅
SAM2	Segment Anything	demo	✅	❓	✅	❌	❌	❌	❌
SAM3-Tracker	Segment Anything	demo	✅	✅	✅	✅	✅	✅	✅

🗺️ Open-Set Detection & Segmentation

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
GroundingDINO	Open-Set Detection With Language	demo	✅	❓	✅	✅	✅	✅	✅
MM-GDINO	Open-Set Detection With Language	demo	✅	❓	✅	✅	✅	✅	✅
LLMDet	Open-Set Detection With Language	demo	✅	❓	✅	✅	✅	✅	✅
OWLv2	Open-Set Object Detection	demo	✅	❓	✅	✅	❌	❌	❌
YOLO-World	Open-Set Detection With Language	demo	✅	✅	✅	✅	✅	✅	✅
YOLOE	Open-Set Detection And Segmentation	demo	✅	✅	✅	✅	✅	✅	✅
SAM3-Image	Open-Set Detection And Segmentation	demo	✅	✅	✅	✅	✅	✅	✅

✨ Background Removal

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
RMBG	Image Segmentation Background Removal	demo	✅	❓	✅	✅	✅	✅	✅
BEN2	Image Segmentation Background Removal	demo	✅	❓	✅	✅	❌	❌	❌

🏃 Multi-Object Tracking

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
ByteTrack	Multi-Object Tracking	demo	❌	❌	❌	❌	❌	❌	❌

💎 Image Super-Resolution

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
Swin2SR	Image Restoration	demo	✅	❓	✅	✅	✅	✅	✅
APISR	Anime Super-Resolution	demo	✅	❓	✅	✅	✅	✅	✅

✂️ Image Matting

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
MODNet	Image Matting	demo	✅	❓	✅	✅	✅	❌	❌
MediaPipe Selfie	Image Segmentation	demo	✅	❓	✅	✅	✅	❌	❌

🤸 Pose Estimation

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
RTMPose	Keypoint Detection	demo	✅	❓	✅	✅	✅	✅	✅
DWPose	Keypoint Detection	demo	✅	❓	✅	✅	✅	✅	✅
RTMW	Keypoint Detection	demo	✅	❓	✅	✅	✅	✅	✅
RTMO	Keypoint Detection	demo	✅	❓	✅	✅	✅	✅	❌

🔍 OCR & Document Understanding

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
DB	Text Detection	demo	✅	❓	✅	✅	❌	❌	❌
FAST	Text Detection	demo	✅	❓	✅	✅	❌	❌	❌
LinkNet	Text Detection	demo	✅	❓	✅	✅	❌	❌	❌
SVTR	Text Recognition	demo	✅	❓	✅	✅	❌	❌	❌
TrOCR	Text Recognition	demo	✅	❓	✅	✅	❌	❌	❌
SLANet	Table Recognition	demo	✅	❓	✅	✅	❌	❌	❌
DocLayout-YOLO	Object Detection	demo	✅	✅	✅	✅	✅	❌	❌

🧩 Vision-Language Models (VLM)

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
BLIP	Image Captioning	demo	✅	❓	✅	❓	❌	❌	❌
Florence2	A Variety of Vision Tasks	demo	✅	❓	✅	✅	❌	❌	❌
Moondream2	Open-Set Object Detection Open-Set Keypoints Detection Image Captioning Visual Question Answering	demo	✅	❓	❌	❌	✅	✅	❌
SmolVLM	Visual Question Answering	demo	✅	❓	✅	❓	❓	❓	❓
SmolVLM2	Visual Question Answering	demo	✅	❓	✅	❓	❓	❓	❓
FastVLM	Vision Language Models	demo	✅	❓	✅	✅	✅	✅	✅

🧬 Embedding Model

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
CLIP	Vision-Language Embedding	demo	✅	❓	✅	✅	✅	✅	✅
jina-clip-v1	Vision-Language Embedding	demo	✅	❓	✅	✅	✅	✅	✅
jina-clip-v2	Vision-Language Embedding	demo	✅	❓	✅	✅	✅	✅	✅
mobileclip	Vision-Language Embedding	demo	✅	❓	✅	✅	✅	✅	✅
DINOv2	Vision Embedding	demo	✅	❓	✅	❌	❌	❌	❌
DINOv3	Vision Embedding	demo	✅	❓	✅	✅	✅	✅	✅

📐 Depth Estimation

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
DepthAnything v1	Monocular Depth Estimation	demo	✅	❓	✅	✅	✅	✅	✅
DepthAnything v2	Monocular Depth Estimation	demo	✅	❓	✅	✅	✅	✅	✅
DepthPro	Monocular Depth Estimation	demo	✅	❓	✅	✅	✅	✅	✅
Depth-Anything-3	Monocular Metric Multi-View	demo	✅	❓	✅	✅	✅	✅	✅

🌌 Others

Model	Task / Description	Demo	Dynamic Batch	TensorRT	FP32	FP16	Q8	Q4f16	BNB4
Sapiens	Foundation for Human Vision Models	demo	✅	❓	✅	✅	✅	✅	✅
YOLOPv2	Panoptic Driving	demo	✅	❓	✅	❌	❌	❌	❌

Documentation

🔧 Cargo Features

❕ Features in italics are enabled by default.

Core & Utilities
- ort-download-binaries: Automatically download prebuilt ONNX Runtime binaries from pyke.
- ort-load-dynamic: Manually link ONNX Runtime. Useful for custom builds or unsupported platforms. See Linking Guide for more details.
- viewer: Real-time image/video visualization (similar to OpenCV imshow). Empowered by minifb.
- video: Video I/O support for reading and writing video streams. Empowered by video-rs.
- hf-hub: Download model files from Hugging Face Hub.
- annotator: Annotation utilities for drawing bounding boxes, keypoints, and masks on images.
Image Formats

Additional image format support (optional for faster compilation):
- image-all-formats: Enable all additional image formats.
- image-gif, image-bmp, image-ico, image-avif, image-tiff, image-dds, image-exr, image-ff, image-hdr, image-pnm, image-qoi, `image-tga: Individual image format support.
Model Categories
- vision: Core vision models (Detection, Segmentation, Classification, Pose, etc.).
- vlm: Vision-Language Models (CLIP, BLIP, Florence2, etc.).
- mot: Multi-Object Tracking utilities.
- all-models: Enable all model categories.
Execution Providers

Hardware acceleration for inference. Enable the one matching your hardware:
- cuda: NVIDIA CUDA execution provider (pure model inference acceleration).
- tensorrt: NVIDIA TensorRT execution provider (pure model inference acceleration).
- nvrtx: NVIDIA NvTensorRT-RTX execution provider (pure model inference acceleration).
- cuda-full: cuda + cuda-runtime-build (Model + Image Preprocessing acceleration).
- tensorrt-full: tensorrt + cuda-runtime-build (Model + Image Preprocessing acceleration).
- nvrtx-full: nvrtx + cuda-runtime-build (Model + Image Preprocessing acceleration).
- coreml: Apple Silicon (macOS/iOS).
- openvino: Intel CPU/GPU/VPU.
- onednn: Intel Deep Neural Network Library.
- directml: DirectML (Windows).
- webgpu: WebGPU (Web/Chrome).
- rocm: AMD GPU acceleration.
- cann: Huawei Ascend NPU.
- rknpu: Rockchip NPU.
- xnnpack: Mobile CPU optimization.
- acl: Arm Compute Library.
- armnn: Arm Neural Network SDK.
- azure: Azure ML execution provider.
- migraphx: AMD MIGraphX.
- nnapi: Android Neural Networks API.
- qnn: Qualcomm SNPE.
- tvm: Apache TVM.
- vitis: Xilinx Vitis AI.
CUDA Support

NVIDIA GPU acceleration with CUDA image processing kernels (requires cudarc):
- cuda-full: Uses cuda-version-from-build-system (auto-detects via nvcc).
- cuda-11040, cuda-11050, cuda-11060, cuda-11070, cuda-11080: CUDA 11.x versions (Model + Preprocess).
- cuda-12000, cuda-12010, cuda-12020, cuda-12030, cuda-12040, cuda-12050, cuda-12060, cuda-12080, cuda-12090: CUDA 12.x versions (Model + Preprocess).
- cuda-13000, cuda-13010: CUDA 13.x versions (Model + Preprocess).
TensorRT Support

NVIDIA TensorRT execution provider with CUDA runtime libraries:
- tensorrt-full: Uses cuda-version-from-build-system (auto-detects via nvcc).
- tensorrt-cuda-11040, tensorrt-cuda-11050, tensorrt-cuda-11060, tensorrt-cuda-11070, tensorrt-cuda-11080: TensorRT + CUDA 11.x runtime.
- tensorrt-cuda-12000, tensorrt-cuda-12010, tensorrt-cuda-12020, tensorrt-cuda-12030, tensorrt-cuda-12040, tensorrt-cuda-12050, tensorrt-cuda-12060, tensorrt-cuda-12080, tensorrt-cuda-12090: TensorRT + CUDA 12.x runtime.
- tensorrt-cuda-13000, tensorrt-cuda-13010: TensorRT + CUDA 13.x runtime.
Note: tensorrt-cuda-* features enable TensorRT execution provider with CUDA runtime libraries for image processing. The "cuda" in the name refers to cudarc dependency.
NVRTX Support

NVIDIA NvTensorRT-RTX execution provider with CUDA runtime libraries:
- nvrtx-full: Uses cuda-version-from-build-system (auto-detects via nvcc).
- nvrtx-cuda-11040, nvrtx-cuda-11050, nvrtx-cuda-11060, nvrtx-cuda-11070, nvrtx-cuda-11080: NVRTX + CUDA 11.x runtime.
- nvrtx-cuda-12000, nvrtx-cuda-12010, nvrtx-cuda-12020, nvrtx-cuda-12030, nvrtx-cuda-12040, nvrtx-cuda-12050, nvrtx-cuda-12060, nvrtx-cuda-12080, nvrtx-cuda-12090: NVRTX + CUDA 12.x runtime.
- nvrtx-cuda-13000, nvrtx-cuda-13010: NVRTX + CUDA 13.x runtime.
Note: nvrtx-cuda-* features enable NVRTX execution provider with CUDA runtime libraries for image processing. The "cuda" in the name refers to cudarc dependency.

🚀 Device Combination Guide

Scenario	Model Device (`--device`)	Processor Device (`--processor-device`)	Required Features (`-F`)
CPU Only	`cpu`	`cpu`	`vision` (default)
GPU Inference (Slow Preprocess)	`cuda`	`cpu`	`cuda`
GPU Inference (Fast Preprocess)	`cuda`	`cuda`	`cuda-full` or `cuda-120xxx`
TensorRT (Slow Preprocess)	`tensorrt`	`cpu`	`tensorrt`
TensorRT (Fast Preprocess)	`tensorrt`	`cuda`	`tensorrt-full` or `tensorrt-cuda-120xxx`

⚠️ In multi-GPU environments (e.g., cuda:0, cuda:1), you MUST ensure that both --device and --processor-device use the SAME GPU ID.

❓ FAQ

ONNX Runtime Issues: For ONNX Runtime related errors, please check the ort issues or onnxruntime issues.
Other Issues: For other questions or bug reports, see issues or open a new discussion.

⚠️ Compatibility Note

If you encounter linking errors with __isoc23_strtoll or similar glibc symbols, use the dynamic loading feature:

cargo run -F ort-load-dynamic --example

Why no LM models?

This project focuses on vision and VLM models under 1B parameters for efficient inference.

Many high-performance inference engines already exist for LM/LLM models like vLLM.

Pure text embedding models may be considered in future releases.

How fast is it?

Refer to YOLO performance benchmarks in the Performance section above.

This project uses multi-threading, SIMD, and CUDA hardware acceleration for optimization.

While vision models like YOLO and RFDETR are optimized, other models may need further interface and post-processing optimization.

🤝 Contributing

This is a personal project maintained in spare time, so progress on performance optimization and new model support may vary.

We highly welcome PRs for model optimization! If you have expertise in specific models and can help optimize their interfaces or post-processing, your contributions would be invaluable. Feel free to open an issue or submit a pull request for suggestions, bug reports, or new features.

🙏 Acknowledgments

This project is built on top of ort (ONNX Runtime for Rust), which provides seamless Rust bindings for ONNX Runtime. Special thanks to the ort maintainers.
Special thanks to @kadu-v for the jamtrack-rs project, which inspired our ByteTracker implementation

Thanks to all the open-source libraries and their maintainers that make this project possible. See Cargo.toml for a complete list of dependencies.

📜 License

This project is licensed under LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
.github		.github
assets		assets
examples		examples
scripts/sam3-image		scripts/sam3-image
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

usls

🌟 Highlights

🚀 Quick Start

Examples

Performance

📦 Model Zoo

Documentation

🔧 Cargo Features

Core & Utilities

Image Formats

Model Categories

Execution Providers

CUDA Support

TensorRT Support

NVRTX Support

🚀 Device Combination Guide

❓ FAQ

⚠️ Compatibility Note

Why no LM models?

How fast is it?

🤝 Contributing

🙏 Acknowledgments

📜 License

About

Uh oh!

Releases 23

Packages

Uh oh!

Contributors 12

Languages

License

jamjamjon/usls

Folders and files

Latest commit

History

Repository files navigation

usls

🌟 Highlights

🚀 Quick Start

Examples

Performance

📦 Model Zoo

Documentation

🔧 Cargo Features

Core & Utilities

Image Formats

Model Categories

Execution Providers

CUDA Support

TensorRT Support

NVRTX Support

🚀 Device Combination Guide

❓ FAQ

⚠️ Compatibility Note

Why no LM models?

How fast is it?

🤝 Contributing

🙏 Acknowledgments

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 23

Packages 0

Uh oh!

Contributors 12

Languages

Packages