Models · Module 02·10 min read

Diffusion Models, Denoised

How do you teach a network to draw something it has never seen? You teach it to undo a mess. A surprisingly elegant idea — and the engine behind every image generator on the internet.

Brain Drip EditorsUpdated April 2026·12 references

The five-bullet version

A diffusion model is trained to undo gradual noise added to images.
Training: take a clean image, add noise, ask the model to predict the noise. Repeat.
Inference: start with pure noise, predict-and-subtract dozens of times until something appears.
The model never sees the clean image at inference — only its own intermediate guesses.
Modern variants (latent diffusion, flow matching) keep the recipe and change the geometry.

§ 00 · THE BIG IDEAGeneration as un-corruption

Imagine you have a clean photograph. You add a tiny bit of static — barely noticeable. Then you add more. And more. After a thousand small steps, the photograph is unrecognizable, indistinguishable from television noise. Now imagine the reverse: starting from pure static, can you remove just enough noise, just slightly, to recover the photograph?

This is a diffusion modeldiffusion model. A class of generative models that learn to reverse a gradual noising process. They sample by starting from pure noise and iteratively denoising.. You don’t teach it to draw a cat. You teach it to look at a slightly-noisy cat and predict the noise. Then at inference, you start with pure noise, ask the model to predict the noise inside it, subtract a small amount of that prediction, and repeat. After a few dozen steps, what’s left is something that wasn’t there before.1

Lab 04 · Live

Reverse Diffusion, Step by Step

Drag the slider from right to left — from pure Gaussian noise to a clean image. This is roughly what a diffusion model does, but where you scrub a slider, the model predicts the noise to remove at each step.

t = 50 / 50

← clean (t=50)noise (t=0) →

Sample:

What you scrubbed by hand is the forward process — the corruption. The model’s job is the reverse. Let’s separate the two.

§ 01 · FORWARD PROCESSThe corruption schedule

The forward process is fixed and not learned. It’s a recipe: at each timestep t, add a small amount of Gaussian noise to the image. The amount is set by a schedule — typically the noise grows slowly at first, then faster.

A useful trick: because Gaussian noise added to Gaussian noise is still Gaussian, you don’t have to actually run all t steps to find out what the image looks like at step t. There’s a closed-form expression. You can jump directly to any noise level. This makes training trivial: pick a random t, jump there, ask the model to predict the noise.

Fig 1The forward (corruption) process. Once you've set the noise schedule, every step is determined — no neural network involved.

§ 02 · REVERSE PROCESSThe part the network does

Reversing is the hard part. Given a noisy image at step t, what was the slightly-less-noisy image at step t−1? In general this is impossible — many clean images could have produced the same noise. But you can ask the easier question: given this noisy image, what was the noise?

That question is well-posed and a neural network can be trained to answer it. The architecture is usually a U-NetU-Net. A convolutional architecture with skip connections between an encoder (downsampling) and decoder (upsampling) path. Originally for medical image segmentation; now the workhorse of image diffusion., although Diffusion TransformersDiT. Diffusion Transformer. A transformer architecture for diffusion models, scaling more cleanly than U-Nets at large parameter counts. have become standard at the frontier.2

Predicting the noise is equivalent to predicting the score — the gradient of the log-density of the data.— Yang Song, Score-Based Generative Modeling, 2020

§ 03 · WHAT THE NETWORK LEARNSA field of arrows

Here’s a way to think about it. The space of all possible images is unimaginably vast. Most of it is noise. A vanishingly small region holds plausible photos. The diffusion model has learned, for any point in this enormous space, an arrow pointing toward the plausible region. To generate, you start anywhere and follow the arrows.

That arrow field is what “the model knows.” It’s why diffusion models can be guided — you can nudge the arrows during inference using a text prompt, a sketch, a pose, a depth map. The arrows themselves don’t change; you just bias which arrow gets followed at each step.

Both the arrow-following at sample time and the gradient descent that createdthose arrows during training are the same operation in different costumes. To make this concrete, here’s a 2D toy of the optimization that built the model in the first place. Drop a marker anywhere; the deeper valleys are the configurations the network considers more plausible. Watch which one it falls into.

LAB 03

Loss Landscape Explorer

Pick a starting point. Watch gradient descent find a minimum — sometimes a good one.

x: 0.18y: 0.70L: 0.879

Learning rate 0.040

TryPick the same starting region twice with different learning rates. High LR overshoots; low LR gets trapped in shallow wells.

CHECKWhy do diffusion models need many sampling steps to produce a good image?

§ 04 · WHERE THIS IS GOINGFaster, sharper, conditional

The 1000-step DDPM of 2020 has been replaced, in production, by 4-step or even 1-step samplers (consistency models, flow matching, rectified flow). The models also got bigger and learned to read text prompts via cross-attention. Video diffusion is the current frontier — a video is just an image with a time axis, and the same machinery works, expensively.

✓

Module complete. You can now read DDPM, score-matching, and flow-matching papers without flinching at the notation. The intuition is the same in every one of them.

References & further reading

Ho, J., Jain, A. & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. arXiv:2006.11239.
Peebles, W. & Xie, S. (2023). Scalable Diffusion Models with Transformers (DiT). arXiv:2212.09748.
Song, Y. et al. (2021). Score-Based Generative Modeling Through Stochastic Differential Equations. arXiv:2011.13456.
Sohl-Dickstein, J. et al. (2015). Deep Unsupervised Learning Using Nonequilibrium Thermodynamics. arXiv:1503.03585.
Rombach, R. et al. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. arXiv:2112.10752.
Karras, T. et al. (2022). Elucidating the Design Space of Diffusion-Based Generative Models. arXiv:2206.00364.
Weng, L. (2021). What are Diffusion Models? Lil’Log.

§ · GOING DEEPERThree threads worth following

The mathematical scaffolding underneath diffusion is more general than “add noise, predict noise.” The same objective can be derived from a variational lower bound (DDPM’s original framing), from a denoising score-matching objective (Song & Ermon), and from a stochastic differential equation (Song et al. 2020). Once you see the SDE formulation, it’s clear why the field has so many samplers — Euler, DDIM, Heun, DPM-Solver — they’re different numerical solvers for the same reverse-time ODE.

The other big lever for real-world image quality is classifier-free guidance: at inference, run the model twice per step — once with a text condition, once without — and push the result in the conditioned direction. It costs you 2× compute per step but dramatically improves fidelity and prompt adherence. And the move from pixel-space to latent-space diffusion (Stable Diffusion) was the engineering decision that made text-to-image affordable on consumer GPUs. None of these are conceptual breaks from the basic recipe; they’re what made it practical.

§ · FURTHER READINGReferences & deeper sources

Sohl-Dickstein, Weiss, Maheswaranathan, Ganguli (2015). Deep Unsupervised Learning using Nonequilibrium Thermodynamics · ICML
Ho, Jain, Abbeel (2020). Denoising Diffusion Probabilistic Models (DDPM) · NeurIPS
Song et al. (2020). Score-Based Generative Modeling through Stochastic Differential Equations · ICLR
Ho, Salimans (2021). Classifier-Free Diffusion Guidance · NeurIPS Workshop
Rombach, Blattmann, Lorenz, Esser, Ommer (2022). High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion) · CVPR

Original figures live in the linked sources — open the papers for the canonical visuals in their full context.