Latent Diffusion (LDM)

This repository provides a from-scratch implementation of latent diffusion model training with the goal of gaining a deeper understanding of diffusion architectures (e.g., U-Net, DiT) and scheduling methods. Both the model and diffusion process are custom-implemented and trained from the ground up. All experiments are constrained to a single NVIDIA RTX 5090 GPU (32 GB VRAM), so the training process is optimized for efficiency under limited compute. The objective is to achieve the highest possible model quality within this constraint.

Work in Progress

This project is currently a work in progress. So far, I’ve trained a baseline Stable Diffusion model using a custom trainer, with custom model and scheduler implementations planned next.

Stable Diffusion - U-Net

https://arxiv.org/abs/2112.10752

Diffusion Transformer (DiT)

Installation

pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128
pip install -e .

Run Formatting

ruff format llm

Huggingface datasets

Set an alternate location for huggingface home

conda env config vars set HF_HOME=/media/bryan/ssd01/huggingface
conda env config vars list

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
bash		bash
ldm		ldm
.gitattributes		.gitattributes
.gitignore		.gitignore
CLIP.md		CLIP.md
DATA.md		DATA.md
LDM_TRAINER.md		LDM_TRAINER.md
LICENSE		LICENSE
README.md		README.md
VAE.md		VAE.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Latent Diffusion (LDM)

Work in Progress

Dataset Selection and Preparation for LDM Training

Variational Autoencoder to compress images into the latent space

CLIP (Contrastive Language–Image Pretraining) text encoder selection

LDM Trainer Implementation Details

Stable Diffusion - U-Net

Diffusion Transformer (DiT)

Installation

Run Formatting

Huggingface datasets

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

anenbergb/latent-diffusion

Folders and files

Latest commit

History

Repository files navigation

Latent Diffusion (LDM)

Work in Progress

Dataset Selection and Preparation for LDM Training

Variational Autoencoder to compress images into the latent space

CLIP (Contrastive Language–Image Pretraining) text encoder selection

LDM Trainer Implementation Details

Stable Diffusion - U-Net

Diffusion Transformer (DiT)

Installation

Run Formatting

Huggingface datasets

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages