Skip to content

Update dependency diffusers to v0.37.0#39

Open
renovate[bot] wants to merge 1 commit intomainfrom
renovate/diffusers-0.x
Open

Update dependency diffusers to v0.37.0#39
renovate[bot] wants to merge 1 commit intomainfrom
renovate/diffusers-0.x

Conversation

@renovate
Copy link
Contributor

@renovate renovate bot commented Mar 5, 2026

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package Change Age Confidence
diffusers ==0.36.0==0.37.0 age confidence

Release Notes

huggingface/diffusers (diffusers)

v0.37.0: Diffusers 0.37.0: Modular Diffusers, New image and video pipelines, multiple core library improvements, and more 🔥

Compare Source

Modular Diffusers

Modular Diffusers introduces a new way to build diffusion pipelines by composing reusable blocks. Instead of writing entire pipelines from scratch, you can now mix and match building blocks to create custom workflows tailored to your specific needs! This complements the existing DiffusionPipeline class, providing a more flexible way to create custom diffusion pipelines.

Find more details on how to get started with Modular Diffusers here, and also check out the announcement post.

New Pipelines and Models

Image 🌆
  • Z Image Omni Base: Z-Image is the foundation model of the Z-Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence. While Z-Image-Turbo is built for speed, Z-Image is a full-capacity, undistilled transformer designed to be the backbone for creators, researchers, and developers who require the highest level of creative freedom. Thanks to @​RuoyiDufor for contributing this in #​12857.
  • Flux2 Klein:FLUX.2 [Klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second. Built for applications that require real-time image generation without sacrificing quality, and runs on consumer hardware, with as little as 13GB VRAM.
  • Qwen Image Layered: Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. This layered representation unlocks inherent editability: each layer can be independently manipulated without affecting other content. Thanks to @​naykun for contributing this in #​12853.
  • FIBO Edit: Fibo Edit is an 8B parameter image-to-image model that introduces a new paradigm of structured control, operating on JSON inputs paired with source images to enable deterministic and repeatable editing workflows. Featuring native masking for granular precision, it moves beyond simple prompt-based diffusion to offer explicit, interpretable control optimized for production environments. Its lightweight architecture is designed for deep customization, empowering researchers to build specialized “Edit” models for domain-specific tasks while delivering top-tier aesthetic quality. Thanks galbria for contributing it in #​12930.
  • Cosmos Predict2.5: Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world. Thanks to @​miguelmartin75 for contributing it in #​12852.
  • Cosmos Transfer2.5: Cosmos-Transfer2.5 is a conditional world generation model with adaptive multimodal control, that produces high-quality world simulations conditioned on multiple control inputs. These inputs can take different modalities—including edges, blurred video, segmentation maps, and depth maps. Thanks to @​miguelmartin75 for contributing it in #​13066.
  • GLM-Image: GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture, effectively pushing the upper bound of visual fidelity and fine-grained details. In general image generation quality, it aligns with industry-standard LDM-based approaches, while demonstrating significant advantages in knowledge-intensive image generation scenarios. Thanks to @​zRzRzRzRzRzRzR for contributing it in #​12973.
  • RAE: Representation Autoencoders (aka RAE) are an exciting alternative to traditional VAEs, typically used in the area of latent-space diffusion models of image generation. RAEs leverage pre-trained vision encoders and train lightweight decoders for the task of reconstruction.
Video + audio 🎥 🎼
  • LTX-2: LTX-2 is an audio-conditioned text-to-video generation model that can generate videos with synced audio. Full and distilled model inference, as well as two-stage inference with spatial sampling, is supported. We also support a conditioning pipeline that allows for passing different conditions (such as images, series of images, etc.). Check out the docs to learn more!
  • Helios: Helios is a 14B video generation model that runs at 17 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching a strong baseline in quality. Thanks to @​SHYuanBest for contributing this in #​13208.

Improvements to Core Library

New caching methods
New context-parallelism (CP) backends
Misc
  • Mambo-G Guidance: New guider implementation (#​12862)
  • Laplace Scheduler for DDPM (#​11320)
  • Custom Sigmas in UniPCMultistepScheduler (#​12109)
  • MultiControlNet support for SD3 Inpainting (#​11251)
  • Context parallel in native flash attention (#​12829)
  • NPU Ulysses Attention Support (#​12919)
  • Fix Wan 2.1 I2V Context Parallel Inference (#​12909)
  • Fix Qwen-Image Context Parallel Inference (#​12970)
  • Introduction to @apply_lora_scale decorator for simplifying model definitions (#​12994)
  • Introduction of pipeline-level “cpu” device_map (#​12811)
  • Enable CP for kernels-based attention backends (#​12812)
  • Diffusers is fully functional with Transformers V5 (#​12976)

A lot of the above features/improvements came as part of the MVP program we have been running. Immense thanks to the contributors!

Bug Fixes

  • Fix QwenImageEditPlus on NPU (#​13017)
  • Fix MT5Tokenizer → use T5Tokenizer for Transformers v5.0+ compatibility (#​12877)
  • Fix Wan/WanI2V patchification (#​13038)
  • Fix LTX-2 inference with num_videos_per_prompt > 1 and CFG (#​13121)
  • Fix Flux2 img2img prediction (#​12855)
  • Fix QwenImage txt_seq_lens handling (#​12702)
  • Fix prefix_token_len bug (#​12845)
  • Fix ftfy imports in Wan and SkyReels-V2 (#​12314, #​13113)
  • Fix is_fsdp determination (#​12960)
  • Fix GLM-Image get_image_features API (#​13052)
  • Fix Wan 2.2 when either transformer isn't present (#​13055)
  • Fix guider issue (#​13147)
  • Fix torchao quantizer for new versions (#​12901)
  • Fix GGUF for unquantized types with unquantize kernels (#​12498)
  • Make Qwen hidden states contiguous for torchao (#​13081)
  • Make Flux hidden states contiguous (#​13068)
  • Fix Kandinsky 5 hardcoded CUDA autocast (#​12814)
  • Fix aiter availability check (#​13059)
  • Fix attention mask check for unsupported backends (#​12892)
  • Allow prompt and prior_token_ids simultaneously in GlmImagePipeline (#​13092)
  • GLM-Image batch support (#​13007)
  • Cosmos 2.5 Video2World frame extraction fix (#​13018)
  • ResNet: only use contiguous in training mode (#​12977)

All commits


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants