| Model | Architecture | Year | Author | Scale | Censored? | Quality/Status |
|---|---|---|---|---|---|---|
| Stable Diffusion XL | unet | 2023 | Stability AI | 2B | Partial | Old but some finetunes remain popular |
| SD1 and SDXL Turbo Variants | unet | 2023 | Stability AI and others | 2B | Partial | Outdated |
| Stable Diffusion 3 | MMDiT | 2024 | Stability AI | 2B | Yes | Outdated, prefer 3.5 |
| Stable Diffusion 3.5 Large | MMDiT | 2024 | Stability AI | 8B | Partial | Outdated, Good Quality for its time |
| Stable Diffusion 3.5 Medium | MMDiT | 2024 | Stability AI | 2B | Partial | Outdated, Good Quality for its time |
| AuraFlow | MMDiT | 2024 | Fal.AI | 6B | Yes | Outdated |
| Flux.1 | MMDiT | 2024 | Black Forest Labs | 12B | Partial | Outdated, High Quality for its time |
| Flux.2 | MMDiT | 2025 | Black Forest Labs | 4B, 9B, 32B | Minimal | Recent, Incredible Quality, choice of speed or quality preference |
| Chroma | MMDiT | 2025 | Lodestone Rock | 8.9B | No | Recent, Decent Quality |
| Chroma Radiance | Pixel MMDiT | 2025 | Lodestone Rock | 8.9B | No | Recent, Bad Quality (WIP) |
| Lumina 2.0 | NextDiT | 2025 | Alpha-VLLM | 2.6B | Partial | Modern, Passable Quality |
| Qwen Image | MMDiT | 2025 | Alibaba-Qwen | 20B | Minimal | Modern, Great Quality, very memory intense |
| Hunyuan Image 2.1 | MMDiT | 2025 | Tencent | 17B | No | Modern, Great Quality, very memory intense |
| Z-Image | S3-DiT | 2025 | Tongyi MAI (Alibaba) | 6B | No | Modern, Great Quality, lightweight |
| Kandinsky 5 | DiT | 2025 | Kandinsky Lab | 6B | No | Modern, Decent Quality |
| Anima | DiT | 2026 | Circlestone Labs | 2B | WTF | Modern, very small, decent for anime |
Old or bad options also tracked listed via Obscure Model Support:
| Model | Architecture | Year | Author | Scale | Censored? | Quality/Status |
|---|---|---|---|---|---|---|
| Stable Diffusion v1 and v2 | unet | 2022 | Stability AI | 1B | No | Outdated |
| Stable Diffusion v1 Inpainting Models | unet | 2022 | RunwayML | 1B | No | Outdated |
| Segmind SSD 1B | unet | 2023 | Segmind | 1B | Partial | Outdated |
| Stable Cascade | unet cascade | 2024 | Stability AI | 5B | Partial | Outdated |
| PixArt Sigma | DiT | 2024 | PixArt | 1B | ? | Outdated |
| Nvidia Sana | DiT | 2024 | NVIDIA | 1.6B | No | Just Bad |
| Nvidia Cosmos Predict2 | DiT | 2025 | NVIDIA | 2B/14B | Partial | Just Bad |
| HiDream i1 | MMDiT | 2025 | HiDream AI (Vivago) | 17B | Minimal | Good Quality, lost community attention |
| OmniGen 2 | MLLM | 2025 | VectorSpaceLab | 7B | No | Modern, Decent Quality, quickly outclassed |
| Ovis | MMDiT | 2025 | AIDC-AI (Alibaba) | 7B | No | Passable quality, but outclassed on launch |
-
Architecture is the fundamental machine learning structure used for the model, UNet's were used in the past but DiT (Diffusion Transformers) are the modern choice
-
Scale is how big the model is - "B" for "Billion", so for example "2B" means "Two billion parameters".
- One parameter is one number value, so for example in fp16 (16 bit, ie 2 bytes per number), a 2B model is 4 gigabytes. In fp8 (8 bit, ie 1 byte per number), a 2B model is 2 gigabytes.
- If you often use fp8 or q8 models, just read the "B" as "gigabytes" for a good approximation
-
Censored? is tested by generating eg "a photo of a naked woman" on the model.
- This test only refers to the base models, finetunes can add nudity and other "risque" content back in.
- Most base models will not generate genitalia, and have limited quality with other body parts and poses. Every popular model has finetunes available to add those capabilities, if you want them.
- Sometimes it's not even intentional censorship, just the simple fact that broad base models aren't good at any one thing - so, again, content-specific finetunes fix that.
- Model censorship can take other forms (eg does it recognize names of celebrities/artists/brands, can it do gore, etc.) so if a model sounds right to you you may want do your own testing to see if it's capable of the type of content you like
- "No" means it generates what was asked,
- "Minimal" means it's eg missing genitals but otherwise complete,
- "Partial" means it's clearly undertrained at NSFW content (eg difficult to prompt for or poor quality body) but doesn't explicitly refuse,
- "Yes" means it's entirely incapable or provides an explicit refusal response,
- "WTF" means it's the opposite of censored, may generate inappropriate content even without being asked.
-
Quality/Status is a loose vibe-based metric to imply whether it's worth using in the current year or not.
-
Video models are in Video Model Support
-
Audio models are in Audio Model Support
Image model(s) most worth using, as of January 2026:
- Z-Image is the best right now, especially for photoreal gens.
- Flux.2 Klein is pretty great too, for Editing or for art style variety.
- Flux.2 Dev is massive, but is the smartest of the bunch if you have the hardware and patience for it.
- Swarm natively supports
.safetensorsformat models with ModelSpec metadata- can also import metadata from some legacy formats used by other UIs (auto webui thumbnails, matrix jsons, etc)
- can also fallback to a
.swarm.jsonsidecar file for other supported file formats
- Swarm can load other model file formats, see Alternative Model Formats
- Notably, quantization technique formats. "Quantization" means shrinking a model to use lower memory than is normally reasonable.
- Normal sizes are named like "BF16", "FP16", "FP8", ... ("BF"/"FP" prefixes are standard formats)
- Quantized sizes have names like "NF4", "Q4_K_M", "Q8", "SVDQ-4", "Int-4", ("Q" means quantized, but there are technique-specific labels)
- BnB NF4 (not recommended, quantization technique)
- GGUF (recommended, good quality quantization technique, slower speed)
- Nunchaku (very recommended, great quality high speed quantization technique)
- TensorRT (not recommended, speedup technique)
- Notably, quantization technique formats. "Quantization" means shrinking a model to use lower memory than is normally reasonable.
- Image demos included below are mini-grids of seeds
1, 2, 3of the promptwide shot, photo of a cat with mixed black and white fur, sitting in the middle of an open roadway, holding a cardboard sign that says "Meow I'm a Cat". In the distance behind is a green road sign that says "Model Testing Street".ran on each model. - For all models, "standard parameters" are used.
- Steps is set to 20 except for Turbo models. Turbo models are ran at their standard fast steps (usually 4).
- CFG is set appropriate to the model.
- Resolution is model default.
- This prompt is designed to require (1) multiple complex components (2) photorealism (3) text (4) impossible actions (cat holding a sign - Most models get very confused how to do this).
- All generations are done on the base model of the relevant class, not on any finetune/lora/etc. Finetunes are likely to significantly change the qualitative capabilities, but unlikely to significantly change general ability to understand and follow prompts.
- This is not a magic perfect test prompt, just a decent coverage of range to showcase approximately what you can expect from the model in terms of understanding and handling challenges.
- You could make a point that maybe I should have set CFG different or used a sigma value or changed up prompt phrasing or etc. and get better quality - this test intentionally uses very bland parameters to maximize identical comparison. Keep in mind that you can get better results out of a model by fiddling parameters.
- You'll note models started being able to do decently well on this test in late 2024. Older models noticeable fail at the basic requirements of this test.
SDXL models work as normal, with the bonus that by default enhanced inference settings will be used (eg scaled up rescond).
Additional, SDXL-Refiner architecture models can be inferenced, both as refiner or even as a base (you must manually set res to 512x512 and it will generate weird results).
- There are official SDXL ControlNet LoRAs from Stability AI here
- and there's a general collection of community ControlNet models here that you can use.
Turbo, LCM (Latent Consistency Models), Lightning, etc. models work the same as regular models, just set CFG Scale to 1 and:
- For Turbo, Steps to 1 Under the Sampling group set Scheduler to Turbo.
- For LCM, Steps to 4. Under the Sampling group set Sampler to lcm.
- For lightning, (?)
Stable Diffusion 3 Medium is supported and works as normal.
By default the first time you run an SD3 model, Swarm will download the text encoders for you.
Under the Sampling parameters group, a parameter named SD3 TextEncs is available to select whether to use CLIP, T5, or both. By default, CLIP is used (no T5) as results are near-identical but CLIP-only has much better performance, especially on systems with limited resources.
Under Advanced Sampling, the parameter Sigma Shift is available. This defaults to 3 on SD3, but you can lower it to around ~1.5 if you wish to experiment with different values. Messing with this value too much is not recommended.
For upscaling with SD3, the Refiner Do Tiling parameter is highly recommended (SD3 does not respond well to regular upscaling without tiling).
- Stable Diffusion 3.5 Large is supported and works as normal, including both normal and Turbo variants.
- The TensorArt 3.5L TurboX works too, just set
CFG Scaleto1, andStepsto8, and advancedSigma Shiftto5 - They behave approximately the same as the SD3 Medium models, including same settings and all, other than harsher resource requirements and better quality.
- You can also use GGUF Versions of the models.
- Stable Diffusion 3.5 Medium is supported and works as normal.
- They behave approximately the same as the SD3 Medium models, including same settings and all.
- You can also use GGUF Versions of the models.
- SD 3.5 Medium support resolutions from 512x512 to 1440x1440, and the model metadata of the official model recommends 1440x1440. However, the official model is not good at this resolution. You will want to click the
☰hamburger menu on a model, thenEdit Metadata, then change the resolution to1024x1024for better results. You can of course set theAspect Ratioparameter toCustomand the edit resolutions on the fly per-image.
(above image is AuraFlow v0.2)
- Fal.ai's AuraFlow v0.1 and v0.2 and v0.3 are supported in Swarm, but you must manually select architecture to use it.
- The model used "Pile T5-XXL" as it's text encoder.
- The model used the SDXL VAE as its VAE.
- This model group was quickly forgotten by the community due to quality issues, but came back into popular attention much later via community finetune "Pony v7".
- Pony wants to be in the
diffusion_modelsfolder, but regular AuraFlow goes inStable-Diffusionfolder
- Pony wants to be in the
- Parameters and usage is the same as any other normal model.
- CFG recommended around 3.5 or 4.
- Pony v7 allows higher resolutions than base AuraFlow normally targets.
- Black Forest Labs' Flux.1 model is fully supported in Swarm https://blackforestlabs.ai/announcing-black-forest-labs/
- Recommended: for best performance on modern nvidia cards, use Nunchaku models.
- These run twice as fast as the next best speed option (fp8) while using less memory too (close to gguf q4)
- Flux dev https://huggingface.co/mit-han-lab/nunchaku-flux.1-dev/tree/main
- Flux Schnell https://huggingface.co/mit-han-lab/nunchaku-flux.1-schnell/tree/main
- Use "fp4" for Blackwell (eg RTX 5090) or newer cards, use "int4" for anything older (4090, 3090, etc.)
- See the Nunchaku Support section for more info on this format
- Recommended: use the GGUF Format Files (best for most graphics cards)
- Flux Schnell https://huggingface.co/city96/FLUX.1-schnell-gguf/tree/main
- Flux Dev https://huggingface.co/city96/FLUX.1-dev-gguf/tree/main
Q6_Kis best accuracy on high VRAM, butQ4_K_Scuts VRAM requirements while still being very close to original quality, other variants shouldn't be used normally- Goes in
(Swarm)/Models/diffusion_models - After adding the model, refresh the list, then you may need to click the
☰hamburger menu on the model, thenEdit Metadataand set theArchitecturetoFlux DevorFlux Schnellas relevant, unless it detects correctly on its own.
- Alternate: the simplified fp8 file (best on 3090, 4090, or higher tier cards):
- Dev https://huggingface.co/Comfy-Org/flux1-dev/blob/main/flux1-dev-fp8.safetensors
- Schnell https://huggingface.co/Comfy-Org/flux1-schnell/blob/main/flux1-schnell-fp8.safetensors
- goes in your regular
(Swarm)/Models/Stable-Diffusiondir
- or, not recommended: You can download BFL's original files:
- Download "Schnell" (Turbo) from https://huggingface.co/black-forest-labs/FLUX.1-schnell
- Or "Dev" (non-Turbo) from https://huggingface.co/black-forest-labs/FLUX.1-dev
- Goes in
(Swarm)/Models/diffusion_models
- Required VAE & TextEncoders will be autodownloaded if you do not already have them, you don't need to worry about those.
- Recommended: for best performance on modern nvidia cards, use Nunchaku models.
- CFG Scale: For both models, use
CFG Scale=1(negative prompt won't work).- For the Dev model, there is also a
Flux Guidance Scaleparameter underSampling, which is a distilled embedding value that the model was trained to use. - Dev can use some slightly-higher CFG values (allowing for negative prompt), possibly higher if you reduce the Flux Guidance value and/or use Dynamic Thresholding.
- For the Dev model, there is also a
- Sampler: Leave it at default (Euler + Simple)
- Steps: For Schnell use Steps=4 (or lower, it can even do 1 step), for Dev use Steps=20 or higher
- Resolution: It natively supports any resolution up to 2 mp (1920x1088), and any aspect ratio thereof. By default will use 1MP 1024x1024 in Swarm. You can take it down to 256x256 and still get good results.
- You can mess with the resolution quite a lot and still get decent results. It's very flexible even past what it was trained on.
- You can do a refiner upscale 2x and it will work but take a long time and might not have excellent quality.
- Enable
Refiner Do Tilingfor any upscale target resolution above 1536x1536.
- Enable
- Flux is best on a very high end GPU (eg 4090) for now. It is a 12B model.
- Smaller GPUs can run it, but will be slow. This requires a lot of system RAM (32GiB+). It's been shown to work as low down as an RTX 2070 or 2060 (very slowly).
- On a 4090, schnell takes about 4/5 seconds to generate a 4-step image, very close to SDXL 20 steps in time, but much higher quality.
- By default swarm will use fp8_e4m3fn for Flux, if you have a very very big GPU and want to use fp16/bf16, under
Advanced SamplingsetPreferred DTypetoDefault (16 bit)
- The Flux.1 Tools announced here by BFL are supported in SwarmUI
- For "Redux", a Flux form of image prompting:
- Download the Redux model to
(SwarmUI)/Models/style_models - (Don't worry about sigclip, it is automanaged)
- Drag an image to the prompt area
- On the top left, find the
Image Promptingparameter group - Select the
Use Style Modelparameter to the Redux model - There's an advanced
Style Model Apply Startparam to allow better structural control from your text prompt- set to 0.1 or 0.2 or so to have the text prompt guide structure before redux takes over styling
- at 0, text prompt is nearly ignored
- The advanced
Style Model Merge Strengthparam lets you partial merge the style model against the nonstyled input, similar to Multiply Strength - The advanced
Style Model Multiply Strengthparam directly multiplies the style model output, similar to Merge Strength
- Download the Redux model to
- For "Canny" / "Depth" models, they work like regular models (or LoRAs), but require an Init Image to function.
- Goes in the regular
diffusion_modelsor lora folder depending on which you downloaded. - You must input an appropriate image. So eg for the depth model, input a Depth Map.
- You can use the controlnet parameter group to generate depth maps or canny images from regular images.
- (TODO: Native interface to make that easier instead of janking controlnet)
- You can use the controlnet parameter group to generate depth maps or canny images from regular images.
- Make sure to set Creativity to
1. - This is similar in operation to Edit models.
- Goes in the regular
- For "Fill" (inpaint model), it works like other inpaint models.
- It's a regular model file, it goes in the regular
diffusion_modelsfolder same as other flux models. - "Edit Image" interface encouraged.
- Mask a region and go.
- Creativity
1works well. - Larger masks recommended. Small ones may not replace content.
- Boosting the
Flux Guidance Scaleway up to eg30may improve quality
- It's a regular model file, it goes in the regular
- For "Kontext" (edit model), it works like other edit models.
- Model download here https://huggingface.co/Comfy-Org/flux1-kontext-dev_ComfyUI/blob/main/split_files/diffusion_models/flux1-dev-kontext_fp8_scaled.safetensors
- Or the nunchaku version here https://huggingface.co/mit-han-lab/nunchaku-flux.1-kontext-dev/tree/main
- Or the official BFL 16 bit upload https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev
- Or some GGUFs here https://huggingface.co/QuantStack/FLUX.1-Kontext-dev-GGUF/tree/main
- It's a regular model file, it goes in the regular
diffusion_modelsfolder same as other flux models. - You will have to manually edit the architecture to be
Flux.1 Kontext Dev, it misdetects by default- Click the
☰hamburger menu on the model, thenEdit Metadata, then select theArchitectureasFlux.1 Kontext Dev, then hit save
- Click the
- Paste images into the prompt box to serve as the reference images it will use to generate.
- If you have an init image and no reference images, the init image will be used.
- Swarm will automatically keep the size of the image correct for Kontext input, but make sure your aspect ratio is matched.
- Kontext can take as many images as you want, but the way this works on the inside is a bit hacky and limited quality.
- Prompt should describe a change to make to the image.
- BFL published an official prompting guide here, following it carefully is recommended: https://docs.bfl.ai/guides/prompting_guide_kontext_i2i
- Model download here https://huggingface.co/Comfy-Org/flux1-kontext-dev_ComfyUI/blob/main/split_files/diffusion_models/flux1-dev-kontext_fp8_scaled.safetensors
- If you want to use the ACE Plus Models (Character consistency)
- Download the LoRAs from https://huggingface.co/ali-vilab/ACE_Plus/tree/main and save as normal loras
- Enable the Flux Fill model, enable the LoRA you chose
- Set
Flux Guidance Scaleway up to50 - Open an image editor for the image you want to use as an input (Drag to center area, click Edit Image)
- set
Init Image Creativityto 1 (max) - Change your
Resolutionparameters to have double theWidth(eg 1024 input, double to 2048) - Add a Mask, draw a dot anywhere in the empty area (this is just a trick to tell the editor to automask all the empty area to the side, you don't need to mask it manually)
- Type your prompt, hit generate
- Black Forest Labs' Flux.2 Models are supported in SwarmUI
- The main "Dev" model is an extremely massive model (32B diffusion model, 24B text encoder) that will demand significant RAM availability on your PC.
- This can easily fill up 128 gigs of system RAM in usage, but does still work on 64 gig systems. Lower than 64 may not be possible, or may require heavily using swapfile.
- The smaller Klein model is preferred for more normal PC hardware.
- Download the standard FP8 model here silveroxides/FLUX.2-dev-fp8_scaled
- Or GGUF version here city96/FLUX.2-dev-GGUF
- Goes in
diffusion_modelsfolder - There's also a turbo model here silveroxides/flux2-dev-turbo-fp8mixed.safetensors
- or as a lora fal/Flux_2-Turbo-LoRA_comfyui.safetensors
- Use scheduler
Align Your Stepsat exactly 8 steps to select the special Flux2-dev-turbo custom scheduler. Difference is minimal though.
- The VAE is a brand new 16x16 downsample VAE with 128 channels. It will be autodownloaded.
- You can swap it for this one CabalResearch/Flux2VAE-Anime-Decoder-Tune which is a finetune to reduce detail artifacting.
- The Text Encoder is 24B Mistral Small 3.2 (2506). It will be autodownloaded.
- There's a GGUF about half the size here mcmonkey/flux2MistralGGUF
- Select via the advanced
Mistral Modelparameter
- Select via the advanced
- There's a GGUF about half the size here mcmonkey/flux2MistralGGUF
- Parameters:
- Prompt: Prompting guide from the model creators here https://docs.bfl.ai/guides/prompting_guide_flux2
- Notably, they trained heavily on complex JSON structured prompts to allow for very complex scene control, though this is not required
- They used a powerful LLM for inputs, allow for multiple languages and a variety of ways of phrasing/formatting text to work out
- Resolution: Flux2 supports just about any resolution you can think of, from 64x64 up to 4 megapixels (2048x2048)
- CFG Scale:
1 - Steps: They recommend 50, 20 still works but may have some quality reduction
- Sigma Shift: Defaults to
2.02 - Flux Guidance Scale: Defaults to
3.5, fiddling this value up a bit (to eg4) may be better - Sampler: Defaults to regular
Euler - Scheduler: Defaults to
Flux2, a new specialty scheduler added for Flux.2 to use, but it makes very little difference - Prompt Images: add up to a max of 6 images to the prompt box to be used as reference images. This uses significantly more memory.
- Prompt: Prompting guide from the model creators here https://docs.bfl.ai/guides/prompting_guide_flux2
- Klein is a smaller variant of Flux.2
- It is lower quality vs the full Flux.2-Dev, but runs much faster. Certain aspects of the quality can actually be better, notably visual quality seems to have been tuned better, overall intelligence is lower.
- There is a 4B and a 9B variant, while the 9B is larger it often seems like the 4B is smarter.
- It uses a smaller text encoder (Qwen 4B for Klein 4B, and Qwen 8B for Klein 9B). It will be autodownloaded.
- Download Klein 4b here
- or a gguf here or base gguf here
- It has a distilled variant (Steps=8, CFG=1), and a "Base" variant (high steps, high CFG)
- or Klein 9b here (you may need to accept a license here)
- or a gguf here
- or klein 9b base here
- or a gguf here
- or klein 9b-kv cache version here
- This version uses "KV Cache" to accelerate image editing.
- Any model with KV Cache support MUST HAVE
9b-kvin the filename. This is how Swarm detects and applies KV Cache behavior to the model. (There is no way to automatically detect). - This requires significantly more VRAM, so most people do not want this.
- Save the file into
diffusion_models - Broadly works the same as Flux.2-Dev
- On the distilled model set
Stepsto8, on base model use normal high step counts - On the distilled model set
CFG Scaleto1, on base model use normal CFG eg7 - They have an official prompting guide here
- Chroma is a derivative of Flux, and is supported in SwarmUI
- FP8 Scaled versions here: silveroxides/Chroma1-HD-fp8-scaled
- Or older revs Clybius/Chroma-fp8-scaled
- Or GGUF versions here: silveroxides/Chroma-GGUF
- Or original BF16 here (not recommended): lodestones/Chroma
- Model files goes in
diffusion_models - Uses standard CFG, not distilled to 1 like other Flux models
- Original official reference workflow used Scheduler=
Align Your Stepswith Steps=26and CFG Scale=4- (It's named
Optimal Stepsin their workflow, but Swarm's AYS scheduler is equivalent to that) - "Sigmoid Offset" scheduler was their later recommendation, it requires a custom node
- You can
git clone https://github.com/silveroxides/ComfyUI_SigmoidOffsetSchedulerinto your ComfyUIcustom_nodes, and then restart SwarmUI, and it will be available from theSchedulerparam dropdown
- You can
- Or, "power_shift" / "beta42" from ComfyUI_PowerShiftScheduler may be better
- Works the same,
git clone https://github.com/silveroxides/ComfyUI_PowerShiftSchedulerinto your ComfyUIcustom_nodesand restart
- Works the same,
- (It's named
- Generally works better with longer prompts. Adding some "prompt fluff" on the end can help clean it up. This is likely related to it being a beta model with an odd training dataset.
- FP8 Scaled versions here: silveroxides/Chroma1-HD-fp8-scaled
- Parameters
- CFG Scale: around
3.5 - Sampler: Defaults to regular
Euler - Scheduler: Defaults to
Beta - Steps: Normal step counts work, official recommendation is
26 - Sigma Shift: Defaults to
1 - Resolution:
1024x1024or nearby values. The HD models were trained extra on1152x1152.
- CFG Scale: around
- Chroma Radiance is a pixel-space model derived from Flux, and is supported in SwarmUI
- It is a work in progress, expect quality to be limited for now
- Download here lodestones/Chroma1-Radiance
- Model files goes in
diffusion_models
- Model files goes in
- It does not use a VAE
- Parameters
- CFG Scale: around
3.5 - Sampler: Defaults to regular
Euler - Scheduler: Defaults to
Beta - Steps: Normal step counts work, higher is recommended to reduce quality issues
- Sigma Shift: Defaults to
1. Set to0to explicitly remove shift and use the underlying comfy default behavior. - Prompt: Long and detailed prompts are recommended.
- Negative Prompt: Due to the model's experimental early train status, a good negative prompt is essential.
- Official example:
This low quality greyscale unfinished sketch is inaccurate and flawed. The image is very blurred and lacks detail with excessive chromatic aberrations and artifacts. The image is overly saturated with excessive bloom. It has a toony aesthetic with bold outlines and flat colors.
- Official example:
- CFG Scale: around
(Generated with the highest degree of image-text alignment preprompt, CFG=4, SigmaShift=6, Steps=20)
- Lumina 2 is an image diffusion transformer model, similar in structure to SD3/Flux/etc. rectified flow DiTs, with an LLM (Gemma 2 2B) as its input handler.
- It is a 2.6B model, similar size to SDXL or SD3.5M, much smaller than Flux or SD3.5L
- You can download the Comfy Org repackaged version of the model for use in SwarmUI here: https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/all_in_one/lumina_2.safetensors
- Or the
diffusion_modelsvariant https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/split_files/diffusion_models/lumina_2_model_bf16.safetensors (this version will by default load in fp8, and run a bit faster on 40xx cards)
- Or the
- Because of the LLM input, you have to prompt it like an LLM.
- This means
a catyields terrible results, instead give it:You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts. <Prompt Start> a catto get good results - Lumina's published reference list of prompt prefixes from source code:
You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts. <Prompt Start>You are an assistant designed to generate high-quality images based on user prompts. <Prompt Start>You are an assistant designed to generate high-quality images with highest degree of aesthetics based on user prompts. <Prompt Start>You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts. <Prompt Start>You are an assistant designed to generate four high-quality images with highest degree of aesthetics arranged in 2x2 grids based on user prompts. <Prompt Start>- You can absolutely make up your own though.
- For longer prompts the prefix becomes less needed.
- This means
- The model uses the Flux.1 VAE
- Parameters:
- CFG: 4 is their base recommendation
- Sigma Shift: The default is
6per Lumina reference script, Comfy recommends3for use with lower step counts, so you can safely mess with this parameter if you want to. 6 seems to be generally better for structure, while 3 is better for fine details by sacrificing structure, but may have unwanted artifacts. Raising step count reduces some artifacts. - Steps: The usual 20 steps is fine, but reference Lumina script uses 250(?!) by default (it has a weird sampler that is akin to Euler at 36 steps actually supposedly?)
- Quick initial testing shows that raising steps high doesn't work any particularly different on this model than others, but the model at SigmaShift=6 produces some noise artifacts at regular 20 steps, raising closer to 40 cuts those out.
- Renorm CFG: Lumina 2 reference code sets a new advanced parameter
Renorm CFGto 1. This is available in Swarm underAdvanced Sampling.- The practical difference is subjective and hard to predict, but enabling it seems to tend towards more fine detail
(Qwen Image ran at CFG=4, Steps=50, Res=1328x1328. This took me about 3 minutes per image. This comparison is unfair to the other models, but this model seems intended to be a 'slow but smart' model, so this is the way to run it for now. The test prompt seems to be particularly hard on Qwen Image, I promise it's smarter than this makes it look lol.)
- Qwen Image is natively supported in SwarmUI.
- Download the model here Comfy-Org/Qwen-Image_ComfyUI
- There's an fp8 and a bf16 variant available. The fp8 model is highly recommended.
- There's the 2512 version and the original available. 2512 is newer and better.
- Or, for nunchaku accelerated version that uses a bit less VRAM and runs faster, nunchaku-tech/nunchaku-qwen-image
- Or, other option for limited memory space, GGUF versions city96/Qwen-Image-gguf
- Or a distilled version here qwen_image_distill_full_fp8_e4m3fn
- This uses CFG=1, Steps=15 or so.
- There's also a couple "Lightning" loras lightx2v/Qwen-Image-Lightning for the base model, CFG=1 Steps=8 or 4
- Save it to
diffusion_models
- The text encoder is Qwen 2.5 VL 7B (LLM), and will be automatically downloaded.
- It has its own VAE, and will be automatically downloaded.
- SageAttention has compatibility issues, if you use Sage it will need to be disabled.
- CFG: You can use CFG=
1for best performance. You can also happily use higher CFGs, eg CFG=4, at a performance cost. - Steps: normal ~20 works, but higher steps (eg 50) is recommended for best quality
- Resolution: 1328x1328 is their recommended resolution, but you can shift it around to other resolutions in a range between 928 up to 1472.
- Performance: Can be fast on Res=928x928 CFG=1 Steps=20, but standard params are very slow (one full minute for a standard res 20 step cfg 4 image on a 4090, compared to ~10 seconds for Flux on the same).
- Requires >30 gigs of system RAM just to load at all in fp8. If you have limited sysram you're gonna have a bad time. Pagefile can help.
- Prompts: TBD, but it seems very friendly to general prompts in both natural language and booru-tag styles. Official recommendations are very long LLM-ish prompts though.
- Sigma Shift: Comfy defaults it to
1.15, but this ruins fine details, so Swarm defaults it to3instead. Many different values are potentially valid. Proper guidance on choices TBD.
- Download the model here Comfy-Org/Qwen-Image_ComfyUI
- There are three controlnet versions available for Qwen Image currently
- Regular form
- There's a regular controlnet-union available here InstantX/Qwen-Image-ControlNet-Union (be sure to rename the file when you save it)
- works like any other controlnet. Select as controlnet model, give it an image, select a preprocessor. Probably lower the strength a bit.
- Compatible with lightning loras.
- If not using Lightning, probably raise your CFG a bit to ensure your prompt is stronger than the controlnet.
- "Model Patch"
- Download here Comfy-Org/Qwen-Image-DiffSynth-ControlNets: model_patches
- Save to ControlNets folder
- Work the same as any other controlnets for basic usage, but advanced controls (eg start/stop steps) don't quite work
- LoRA form
- Download here Comfy-Org/Qwen-Image-DiffSynth-ControlNets: loras
- Save to loras folder
- Select the lora, use with a regular qwen image base model
- Upload a prompt image of controlnet input (depth or canny)
- You can create this from an existing image by using the Controlnet Parameter group, select the preprocessor (Canny, or MiDAS Depth), and hit "Preview"
- You cannot use the controlnet parameters directly for actual generation due to the weird lora-hack this uses
- Regular form
- Note that Qwen Image controlnets do not work the best on the Qwen Image Edit model.
- The Qwen Image Edit model can be downloaded here: Comfy-Org/Qwen-Image-Edit_ComfyUI
qwen_image_edit_2511_fp8mixedrecommended currently- Or GGUF version here: unsloth/Qwen-Image-Edit-2511-GGUF or old 2509 QuantStack/Qwen-Image-Edit-2509-GGUF (or old version QuantStack/Qwen-Image-Edit-GGUF)
- Or nunchaku version here: nunchaku-qwen-image-edit-2509 (or old version nunchaku-qwen-image-edit)
- For original Edit or v2509, the architecture cannot be autodetected and must be set manually. 2511 can autodetect.
- Click the
☰hamburger menu on a model, thenEdit Metadata, then changeArchitecturetoQwen Image Edit Plusand hitSave- For the original model (prior to 2509), use
Qwen Image Edit
- For the original model (prior to 2509), use
- Click the
- Most params are broadly the same as regular Qwen Image
- CFG must be
1, Edit is not compatible with higher CFGs normally (unless using an advanced alternate guidance option) - Sigma Shift:
3or lower (as low as0.5) is a valid range. Some users report that a value below 1 might be ideal for single-image inputs. - You can insert image(s) to the prompt box to have it edit that image
- It will focus the first image, but you can get it to pull features from additional images (with limited quality)
- Qwen Image Edit Plus works with up to 3 images well
- Use phrasing like
The person in Picture 1to refer to the content of specific input images in the prompt - There are a few samples of how to prompt here https://www.alibabacloud.com/help/en/model-studio/qwen-image-edit-api
Smart Image Prompt Resizingparameter (top-left, under Image Prompting) will resize your input images automatically. Turn this off if you've carefully sized your images in advance.- Some versions of Qwen Edit require strict sizing to work well. 2511 reportedly works fine within a range of options.
- There are a couple dedicated Qwen Image Edit Lightning Loras lightx2v/Qwen-Image-Edit-2511-Lightning or for older copies lightx2v/Qwen-Image-Lightning
- Take care to separate the Edit lora vs the base Qwen Image lora.
- Hunyuan Image 2.1 is supported in SwarmUI.
- The main model's official original download here: tencent/HunyuanImage-2.1, save to
diffusion_models- FP8 download link pending
- Or GGUF: QuantStack/HunyuanImage-2.1-GGUF
- There is also a distilled variant, you can download here: Comfy-Org/HunyuanImage_2.1_ComfyUI.
- (The tencent upload does not work, use the linked upload)
- FP8 download link pending
- Or GGUF: QuantStack/HunyuanImage-2.1-Distilled-GGUF
- They also provide and recommend a Refiner model, you can download that here: hunyuanimage-refiner
- FP8 download link pending
- Or GGUF: QuantStack/HunyuanImage-2.1-Refiner-GGUF
- This naturally is meant to be used via the Refine/Upscale parameter group in Swarm.
- Set
Refiner Control Percentageto1, setRefiner Stepsto4, setRefiner CFG Scaleto1 - You may also want to mess with the prompt, official recommend is some hacky LLM stuff:
<|start_header_id|>system<|end_header_id|>Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background: <|eot_id|><|start_header_id|>user<|end_header_id|> Make the image high quality<|eot_id|>. You can use<base> my prompt here <refiner> that llm junk herein Swarm to automatically emit refiner-specific prompts.
- Set
- This specific model is not required. In fact, it's pretty bad. It can be replaced with other models of other architectures - pick the model with details you like and refine with that instead.
- Running the base model without a refiner works too, but fine detail quality is bad. You'll want to pick a refiner. (Possibly finetunes will fix the base in the future, as happened eg with SDXL Base years ago.)
- CFG Scale: Normal CFG range, recommended around
3.5. The distilled model is capable of CFG=1. The refiner requires CFG=1. - Steps: Normal step values, around
20. Refiner prefers4. - Resolution: Targets
2048x2048, can work at lower resolutions too.- The VAE is a 32x32 downscale (vs most image models use 8x8), so it's a much smaller latent image than other models would have at this scale.
- 2048 on this model is the same latent size as 512 on other models.
- Sigma Shift: Default is
5. Refine defaults to4. - TBD: Info specific to Distilled variant usage (doesn't seem to work well with their documented settings, testing TBD or comfy fix), and dedicated Refiner model
- The main model's official original download here: tencent/HunyuanImage-2.1, save to
(Steps=9, Z-Image Turbo)
- Z-Image and Z-Image Turbo are supported in SwarmUI!
- It is a 6B scaled model, with both a strong base and an official turbo designed to run extremely fast while competing at the top level of image models
- "Edit" and "Omni" variants are still expected
- The "Turbo" model was the first version officially released download here Z-Image-Turbo-FP8Mixed
- Or the original BF16 fat version Comfy-Org/z_image_turbo
- Or GGUF version here jayn7/Z-Image-Turbo-GGUF
- Save in
diffusion_models
- The "Base" model was released around 2 months later, download here (Pending: good fp8mixed)
- Or the original BF16 fat version Comfy-Org/z_image
- Save in
diffusion_models
- Uses the Flux.1 VAE, will be downloaded and handled automatically
- You might prefer swapping to the UltraFlux VAE which gets better photorealism quality (be sure to rename the file when you save it, eg
Flux/UltraFlux-vae.safetensors)
- You might prefer swapping to the UltraFlux VAE which gets better photorealism quality (be sure to rename the file when you save it, eg
- Parameters:
- Prompt: Supports general prompting in any format just fine. Speaks English and Chinese deeply, understands other languages decently well too.
- Sampler: Default is fine. Some users find
Euler Ancestralcan be better on photorealism detail. Comfy examples suggestsRes MultiStep. - Scheduler: Default is fine. Some users find
Betacan be very slightly better. - CFG Scale: For Turbo,
1, for base normal CFG ranges (eg 4 or 7) - Steps: For Turbo, small numbers are fine.
4will work,8is better. For Base, 20+ steps as normal.- Original Turbo repo suggests 5/9, but this appears redundant in Swarm.
- For particularly difficult prompts, raising Steps up to
20on Turbo or50on Base may help get the full detail.
- Resolution: Side length
1024is the standard, but anywhere up to2048is good.512noticeably loses some quality, above2048corrupts the image. - Sigma Shift: Default is
3, raising to6can yield stronger coherence. - Here's a big ol' grid of Z-Image Turbo params: Z-Image MegaGrid
- There's a trick to get better seed variety in Z-Image Turbo: (This is less needed on Base)
- Add an init image (Any image, doesn't matter much - the broad color bias of the image may be used, but that's about it).
- Set Steps higher than normal (say 8 instead of 4)
- Set Init Image Creativity to a relatively high value (eg 0.7)
- Set Advanced Sampling -> Sigma Shift to a very high value like
22 - Hit generate.
- (This basically just screws up the model in a way it can recover from, but the recovery makes it take very different paths depending on seed)
- There's a "DiffSynth Model Patch" controlnet-union available here alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1
- This goes in your regular ControlNets folder
- Comfy treats this as separate "model_patches", to use Comfy folder format, add
;model_patchesto the end of Server Config->Paths->SDControlNetsFolder
- Comfy treats this as separate "model_patches", to use Comfy folder format, add
- Proper Architecture ID is
Z-Image ControlNet (DiffPatch) - Works like any other controlnet. Select as controlnet model, give it an image, select a preprocessor. Fiddle the strength to taste.
- Despite being a Union controlnet, the Union Type parameter is not used.
- Because it is "Model Patch" based, the Start and End parameters also do not work.
- This goes in your regular ControlNets folder
- Kandinsky 5 Image Lite is supported in SwarmUI
- Also the video models, docs in the video model support doc
- There are multiple variants, pick one to download from here: kandinskylab/kandinsky-50-image-lite
- Parameters:
- CFG Scale: regular CFG such as
5works. - Steps: Regular 20+ steps.
- Resolution: Side length 1024.
- CFG Scale: regular CFG such as
- Anima by Circlestone Labs is a 2B anime model build on Cosmos, and it is fully supported in SwarmUI.
- It is designed to be tiny, lightweight, fast, but built on a strong architecture.
- It is the first model architecture publicly released that was sponsored by Comfy Org!
- It is explicitly still in Preview status, they will be training it further before it's entirely ready.
- Download the preview version here
- Save in
diffusion_models
- Save in
- It uses a tiny Qwen 3 600M ("0.6B") text encoder. This will be autodownloaded.
- It uses the Qwen Image VAE. This will be autodownloaded.
- Parameters:
- Prompt: Trained on both booru style tag prompts (
1girl, etc) and natural language prompts. They have official specific writing guidance here - CFG Scale: Regular CFG scales (eg
4) work. - Steps: Regular 20+ steps.
- Resolution: Side length 1024 recommend, but any lower value works too. Higher values do not work well. Refiner upscale needs tiling due to corruption at high res.
- Sampler: Defaults to
ER-SDE-Solver, but all common samplers work. They officially recommend also trying outEuler AncestralorDPM++ 2M SDE - Scheduler: Default is fine (
Simple), or you can experiment at will. The model is adaptable.
- Prompt: Trained on both booru style tag prompts (
- Video models are documented in Video Model Support.
- You can use some (not all) Text2Video models as Text2Image models.
- Generally, just set Text2Video Frames to
1and it will be treated as image gen. - Some models may favor different parameters (CFG, Steps, Shift, etc.) for images vs videos.
- Generally, just set Text2Video Frames to
- BnB NF4 and FP4 format models, such as this copy of Flux Dev lllyasviel/flux1-dev-bnb-nf4, are partially supported in SwarmUI automatically.
- The detection internally works by looking for
bitsandbytes__nf4orbitsandbytes__fp4in the model's keys - The first time you try to load a BNB-NF4 or BNB-FP4 model, it will give you a popup asking to install support
- This will autoinstall silveroxides/ComfyUI_bnb_nf4_fp4_Loaders which is developed by silveroxides, comfyanonymous, and lllyasviel, and is under the AGPL license.
- You can accept this popup, and it will install and reload the backend
- Then try to generate again, and it should work
- The detection internally works by looking for
- Note that BnB-NF4 and BNB-FP4 models have multiple compatibility limitations, including even LoRAs don't apply properly.
- If you want a quantized flux model, GGUF is recommended instead.
- Support is barely tested, latest bnb doesn't work with comfy but old bnb is incompatible with other dependencies, good luck getting it to load.
- Seriously, just use GGUF or something. bnb is not worth it.
- GGUF Quantized
diffusion_modelsmodels are supported in SwarmUI automatically.- The detection is based on file extension.
- They go in
(Swarm)/Models/diffusion_modelsand work similar to otherdiffusion_modelsformat models- Required VAE & TextEncoders will be autodownloaded if you do not already have them.
- The first time you try to load a GGUF model, it will give you a popup asking to install support
- This will autoinstall city96/ComfyUI-GGUF which is developed by city96.
- You can accept this popup, and it will install and reload the backend
- Then try to generate again, and it should just work
- MIT Han Lab's "Nunchaku" / 4-bit SVDQuant models are a unusual quant format that is supported in SwarmUI.
- Nunchaku is a very dense quantization of models (eg 6GiB for Flux models) that runs very fast (4.4 seconds for a 20 step Flux Dev image on Windows RTX 4090, vs fp8 is ~11 seconds on the same)
- It is optimized for modern nvidia GPUs, with different optimizations per gpu generation
- RTX 30xx and 40xx cards need "int4" format nunchaku models
- RTX 50xx or newer cards need "fp4" format nunchaku models
- They go in
(Swarm)/Models/diffusion_modelsand work similar to otherdiffusion_modelsformat models- Make sure you download a "singlefile" nunchaku file, not a legacy "SVDQuant" folder
- Required VAE & TextEncoders will be autodownloaded if you do not already have them.
- For the older "SVDQuant" Folder Models mit-han-lab/svdquant, The detection is based on the folder structure, you need the files
transformer_blocks.safetensorsandcomfy_config.jsoninside the folder. You cannot have unrelated files in the folder. - The first time you try to load a Nunchaku model, it will give you a popup asking to install support
- This will autoinstall mit-han-lab/ComfyUI-nunchaku and its dependencies
- You can accept this popup, and it will install and reload the backend
- Then try to generate again, and it should just work
- Nunchaku has various compatibility limitations due to hacks in the custom nodes. Not all lora, textenc, etc. features will work as intended.
- It does not work on all python/torch/etc. versions, as they have deeply cursed dependency distribution
- The
Nunchaku Cache Thresholdparam is available to enable block-caching, which improves performance further at the cost of quality.
- TensorRT support (
.engine) is available for SDv1, SDv2-768-v, SDXL Base, SDXL Refiner, SVD, SD3-Medium - TensorRT is an nvidia-specific accelerator library that provides faster SD image generation at the cost of reduced flexibility. Generally this is best for heavy usages, especially for API/Bots/etc. and less useful for regular individual usage.
- You can generate TensorRT engines from the model menu. This includes a button on-page to autoinstall TRT support your first time using it, and configuration of graph size limits and optimal scales. (TensorRT works fastest when you generate at the selected optimal resolution, and slightly less fast at any dynamic resolution outside the optimal setting.)
- Note that TensorRT is not compatible with LoRAs, ControlNets, etc.
- Note that you need to make a fresh TRT engine for any different model you want to use.
These obscure/old/bad/unpopular/etc. models have been moved to Obscure Model Support








