318 words
2 minutes
Equilibrium beats Flow: Better Way to Train Diffusion Model

Key Takeaways#

From now on, when trying diffusion model, use Equilibrium Matching (EqM) to learn the equilibrium (static) gradient of an implicit energy landscape instead of using Flow Matching learns non-equilibrium velocity field that varies over time

Methods#

  • Flow Matching (FM)

    • LFM=(f(xt,t)βˆ’(xβˆ’Ο΅))2L_{FM}= (f(x_t,t)βˆ’(xβˆ’Ο΅))^2
      • tt is a timestep sampled uniformly between 0 and 1, ϡϡ is Gaussian noise, xx is a sample from the training set
    • Luncondβˆ’FM=(f(xt)βˆ’(xβˆ’Ο΅))2L_{uncond-FM}= (f(x_t)βˆ’(xβˆ’Ο΅))^2
      • This can also learn equilibrium dynamics, but doing so will degrades generation quality
  • Equilibrium Matching (EqM)

    • LeqM=(f(xΞ³)βˆ’(Ο΅βˆ’x)c(Ξ³))2L_{eqM}= (f(x_Ξ³)βˆ’(Ο΅βˆ’x)c(Ξ³))^2
      • Ξ³Ξ³ is an interpolation factor sampled uniformly between 0 and 1 but unlike tt in FM, Ξ³Ξ³ is implicit and not seen by the model, and c(Ξ³)c(Ξ³) is a positive constant that controls the gradient magnitude

  • [Optional] Explicit Energy Model

    • LEqMβˆ’E=(βˆ‡g(xΞ³)βˆ’(Ο΅βˆ’x)c(Ξ³))2L_{EqM-E} = (βˆ‡g(x_Ξ³) βˆ’(Ο΅βˆ’x)c(Ξ³))^2
      • gg is an explicit energy model that outputs a scalar energy value, there are two ways to construct it from an existing Equilibrium Matching model ff without having to introduce new parameters:
        • Dot Product: g(xΞ³)=xΞ³β‹…f(xΞ³)g(x_Ξ³) = x_Ξ³ Β· f(x_Ξ³)
        • Squared L2 Norm: g(xΞ³)=βˆ’1/2∣∣f(xΞ³)∣∣22g(x_Ξ³) = βˆ’1/2||f(x_Ξ³)||^2_2
  • Sampling with Gradient Descent Optimizers

    • Gradient Descent Sampling (GD): xk+1←xkβˆ’Ξ·βˆ‡E(xk)x_{k+1} ← x_k βˆ’Ξ·βˆ‡E(x_k);
    • Nesterov Accelerated Gradient (NAG-GD): xk+1←xkβˆ’Ξ·βˆ‡E(xk+Β΅(xkβˆ’xkβˆ’1))x_{k+1} ← x_k βˆ’Ξ·βˆ‡E(x_k +Β΅(x_k βˆ’x_{kβˆ’1})), any other similar optimizer like Adam should also work
    • EE may be learned implicitly (βˆ‡E(x)=f(x)βˆ‡E(x) = f(x)) or explicitly (βˆ‡E(x)=βˆ‡g(x)βˆ‡E(x) = βˆ‡g(x))
    • Sampling with Adaptive Compute: Another advantage of gradient-based sampling is that instead of a fixed number of sampling steps, we can allocate adaptive compute per sample by stopping when the gradient norm drops below a certain threshold gming_{min}

Evaluations#

Main Results: Ablation Study:

Unique properties of Equilibrium Matching that are not supported by traditional diffusion/flow models:

  • Partially Noised Image Denoising: By learning an equilibrium dynamic, Equilibrium Matching can directly start with and denoise a partially noised image.

  • Out-of-Distribution Detection: Can perform out-of-distribution (OOD) detection using energy value, in-distribution (ID) samples typically have lower energies than OOD samples.

  • Composition: Naturally supports the composition of multiple models by adding energy landscapes together (corresponding to adding the gradients of each model).

Equilibrium beats Flow: Better Way to Train Diffusion Model
https://mrforexample.github.io/Path-to-AMI/posts/1/
Author
Mr. For Example
Published at
2025-10-11
License
CC BY-NC-SA 4.0