ComfyUI VideoChunkTools

10 utility nodes for generating long videos in ComfyUI by splitting them into overlapping chunks with rolling reference frames.

Designed to solve the "context window reversal" problem — where video generation models (Wan 2.1/2.2, FantasyPortrait, etc.) revert to the starting state after ~135 frames because the reference image embedding pulls the generation back.

Now with per-chunk text prompting — change the narrative as your video progresses.

The Problem

Most image-to-video models have a limited context window (~81–137 frames). When you try to generate longer videos, the model "forgets" the accumulated motion and snaps back to looking like frame 1. The result is an obvious loop or identity shift.

The Solution

Rolling Reference: Generate video in chunks, using the last frame of each chunk as the reference image for the next chunk. This keeps identity consistent while allowing pose/action to progress naturally.

Chunk 0: [ref=original portrait]  → 81 frames → last frame becomes next ref
Chunk 1: [ref=chunk 0 last frame] → 81 frames → last frame becomes next ref
Chunk 2: [ref=chunk 1 last frame] → 81 frames → ...
Final:   Concatenate all chunks → 240+ frame seamless video

Nodes

Core Utility Nodes (7 nodes — no dependencies beyond ComfyUI)

Node	Purpose
Extract Video Chunk	Pull chunk N from a longer driving video, with configurable overlap
Blend Video Chunks (Crossfade)	Crossfade-blend two overlapping pixel-space chunks into one seamless video
Blend Latent Chunks (Pre-Decode)	Join two latent chunks before VAE decode — supports slerp, hard_cut, crossfade
Concat Video Chunks	Simple concatenation with optional first-frame trim (for rolling reference)
Get Frame By Index	Extract a single frame by index (-1 = last frame = next reference)
Get Frame Range	Extract a range of frames with negative indexing support
Video Chunk Planner	Calculate chunking strategy — shows chunk count, frame ranges, workflow steps

Wan-Specific Nodes (2 nodes — require ComfyUI-WanVideoWrapper)

Node	Purpose
Wan Chunked I2V Sampler ♾️	All-in-one node: encode→sample→decode→extract ref→repeat. Supports per-chunk text prompting.
Wan Chunk Calculator 🧮	Calculate exact total_frames for N chunks with 4n+1 normalization

Text Conditioning Node (1 node — for per-chunk prompting)

Node	Purpose
Chain Text Embeds 🔗	Chain multiple WanVideoTextEncode outputs into an ordered sequence for per-chunk text conditioning

The Wan nodes gracefully degrade — if WanVideoWrapper isn't installed, the 7 core nodes still load and work fine.

Installation

Option 1: ComfyUI Manager (Recommended)

Search for VideoChunkTools in ComfyUI Manager and click Install.

Option 2: Git Clone

cd ComfyUI/custom_nodes
git clone https://github.com/gregtee2/ComfyUI_VideoChunkTools.git

Option 3: Manual Download

Download the ZIP from GitHub, extract to ComfyUI/custom_nodes/ComfyUI_VideoChunkTools/.

Restart ComfyUI after installation.

No pip dependencies required — uses only PyTorch (already in ComfyUI).

Workflows

Rolling Reference (Manual — Any I2V Model)

Use the core utility nodes to build a rolling-reference pipeline with any image-to-video model:

Video Chunk Planner → See how many chunks you need
Extract Video Chunk (index=0) → Get first driving chunk
Run your I2V model with your original reference image
Get Frame By Index (index=-1) → Extract last frame as new reference
Extract Video Chunk (index=1) → Get next driving chunk
Run your I2V model with the new reference
Blend Video Chunks or Concat Video Chunks → Join the results
Repeat for each chunk

All-In-One (Wan Models)

The Wan Chunked I2V Sampler handles everything in a single node:

Connect your Wan model, VAE, and start image
Set total_frames (e.g., 241 for ~15 seconds at 16fps)
Set chunk_frames (e.g., 81)
Hit Queue — the node generates all chunks automatically

Features:

Single-pass or Two-pass sampling (connect model_b for split denoising)
FLF (First-Last-Frame) — connect an end_image to guide the final frame
Multi-keyframe FLF — provide a batch of end images to distribute across chunks
Crossfade overlap — set end_blend_chunks for smooth FLF transitions
Auto 4n+1 normalization — chunk sizes are automatically adjusted for Wan's requirements
Per-chunk text prompts — connect a ChainTextEmbeds node to change the text conditioning per-chunk

Per-Chunk Text Prompting (Wan Models)

Change the narrative as your video progresses — each chunk can have its own text prompt:

Add multiple WanVideoTextEncode nodes, each with a different prompt
Connect them to a Chain Text Embeds 🔗 node (embed_1, embed_2, embed_3, ...)
Connect the embed_sequence output to the sampler's text_embed_sequence input
Chunk 1 uses embed_1, chunk 2 uses embed_2, etc.
If you have fewer prompts than chunks, the last prompt repeats for remaining chunks

[WanVideoTextEncode: "A cat sleeps on a sofa"]──┐
[WanVideoTextEncode: "The cat wakes up"]─────────┼──▶ [Chain Text Embeds 🔗]──▶ text_embed_sequence
[WanVideoTextEncode: "The cat jumps off"]────────┘
                                                              │
[Wan Model + VAE + Image]──────────────────────────▶ [Wan Chunked I2V Sampler ♾️]

Tip: You can still use the single text_embeds input if you want the same prompt for all chunks. The sequence input takes priority when connected.

Node Details

Extract Video Chunk

Divides a video into chunks with overlap. Adjacent chunks share overlap_frames at their boundary.

Input	Type	Default	Description
images	IMAGE	—	Full driving video
chunk_index	INT	0	Which chunk to extract (0-based)
chunk_frames	INT	81	Frames per chunk
overlap_frames	INT	16	Frames shared between adjacent chunks

Output	Type	Description
chunk	IMAGE	Extracted chunk
total_chunks	INT	How many chunks cover the full video
chunk_index	INT	Pass-through for chaining
is_last_chunk	BOOLEAN	True if this is the final chunk

Blend Video Chunks (Crossfade)

Crossfade two overlapping pixel-space chunks. The last N frames of chunk_a smooth-transition into the first N frames of chunk_b.

Input	Type	Default	Description
chunk_a	IMAGE	—	First (earlier) chunk
chunk_b	IMAGE	—	Second (later) chunk
overlap_frames	INT	16	Frames to crossfade
blend_curve	ENUM	ease_in_out	linear, ease_in_out, sigmoid

Blend Latent Chunks (Pre-Decode)

Join two 5D video latents along the temporal dimension before VAE decode. Operates in latent space — overlap is in latent frames (Wan: pixel_overlap / 4).

Input	Type	Default	Description
latent_a	LATENT	—	First latent chunk
latent_b	LATENT	—	Second latent chunk
overlap_frames	INT	4	Overlap in latent temporal frames
blend_curve	ENUM	hard_cut	hard_cut, slerp, linear, ease_in_out, sigmoid

hard_cut (recommended for rolling reference): Clean cut at the overlap midpoint — no dissolve artifacts.

slerp: Spherical linear interpolation — preserves latent vector magnitude. Standard technique for diffusion model interpolation.

Concat Video Chunks

Simple concatenation. For rolling-reference workflows where chunk B's first frame naturally matches chunk A's last frame.

Input	Type	Default	Description
chunk_a	IMAGE	—	First chunk
chunk_b	IMAGE	—	Second chunk
trim_b_start	INT	1	Frames to trim from B's start (1 = drop duplicate ref frame)

Get Frame By Index

Extract a single frame. Use -1 for the last frame (rolling reference).

Get Frame Range

Extract a range of frames. Supports negative indexing. end=0 means "to the end".

Video Chunk Planner

Outputs the total number of chunks needed and a detailed text plan showing frame ranges and workflow steps.

Wan Chunked I2V Sampler ♾️

All-in-one node for Wan I2V models. See the Workflows section above.

Wan Chunk Calculator 🧮

Simple math: calculates total_frames = chunk_frames + (num_chunks - 1) * (chunk_frames - 1) with 4n+1 normalization.

Chain Text Embeds 🔗

Chains up to 8 pre-encoded text embeddings into an ordered sequence for per-chunk text conditioning.

Input	Type	Required	Description
embed_1	WANVIDEOTEXTEMBEDS	Yes	Text embedding for chunk 1
embed_2–embed_8	WANVIDEOTEXTEMBEDS	No	Text embeddings for chunks 2–8

Output	Type	Description
embed_sequence	TEXT_EMBED_SEQUENCE	Ordered list of embeddings — connect to sampler's `text_embed_sequence` input

If you have fewer embeds than chunks, the last embed repeats for all remaining chunks. Non-connected slots are skipped (embed_1 + embed_3 = 2-entry sequence).

Tips

Overlap of 16 frames works well for most cases (about 1 second at 16fps)
ease_in_out blend curve gives the smoothest pixel-space transitions
hard_cut in latent space is usually best — rolling reference already makes the overlap zone match
slerp is the gold standard for latent interpolation if you need blending
For Wan models, chunk_frames must be 4n+1 (5, 9, 13, ..., 77, 81, 85, ...). The nodes auto-normalize this.
Use Video Chunk Planner first to understand how your video will be divided

Requirements

ComfyUI (any recent version)
PyTorch (included with ComfyUI)
ComfyUI-WanVideoWrapper — only needed for the 2 Wan-specific nodes. The 7 core nodes work without it.

License

MIT — see LICENSE

Credits

Built by Greg Tee for the ComfyUI community.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
nodes.py		nodes.py
pyproject.toml		pyproject.toml
wan_chunked_sampler.py		wan_chunked_sampler.py

License

gregtee2/ComfyUI_VideoChunkTools

Folders and files

Latest commit

History

Repository files navigation

ComfyUI VideoChunkTools

The Problem

The Solution

Nodes

Core Utility Nodes (7 nodes — no dependencies beyond ComfyUI)

Wan-Specific Nodes (2 nodes — require ComfyUI-WanVideoWrapper)

Text Conditioning Node (1 node — for per-chunk prompting)

Installation

Option 1: ComfyUI Manager (Recommended)

Option 2: Git Clone

Option 3: Manual Download

Workflows

Rolling Reference (Manual — Any I2V Model)

All-In-One (Wan Models)

Per-Chunk Text Prompting (Wan Models)

Node Details

Extract Video Chunk

Blend Video Chunks (Crossfade)

Blend Latent Chunks (Pre-Decode)

Concat Video Chunks

Get Frame By Index

Get Frame Range

Video Chunk Planner

Wan Chunked I2V Sampler ♾️

Wan Chunk Calculator 🧮

Chain Text Embeds 🔗

Tips

Requirements

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages