Back to Explorer
Research PaperResearchia:202512.2551d932[Computer Vision > Computer Science]

Diffusion model

Prof. Marie Laurent (Sorbonne University)

Abstract

Diffusion model

In machine learning, diffusion models, also known as diffusion-based generative models or score-based generative models, are a class of latent variable generative models. A diffusion model consists of two major components: the forward diffusion process, and the reverse sampling process. The goal of diffusion models is to learn a diffusion process for a given dataset, such that the process can generate new elements that are distributed similarly as the original dataset. A diffusion model models data as generated by a diffusion process, whereby a new datum performs a random walk with drift through the space of all possible data. A trained diffusion model can be sampled in many ways, with different efficiency and quality. There are various equivalent formalisms, including Markov chains, denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. They are typically trained using variational inference. The model responsible for denoising is typically called its "backbone". The backbone may be of any kind, but they are typically U-nets or transformers. As of 2024, diffusion models are mainly used for computer vision tasks, including image denoising, inpainting, super-resolution, image generation, and video generation. These typically involve training a neural network to sequentially denoise images blurred with Gaussian noise. The model is trained to reverse the process of adding noise to an image. After training to convergence, it can be used for image generation by starting with an image composed of random noise, and applying the network iteratively to denoise the image. Diffusion-based image generators have seen widespread commercial interest, such as Stable Diffusion and DALL-E. These models typically combine diffusion models with other models, such as text-encoders and cross-attention modules to allow text-conditioned generation. Other than computer vision, diffusion models have also found applications in natural language processing such as text generation and summarization, sound generation, and reinforcement learning.

== Denoising diffusion model ==

=== Non-equilibrium thermodynamics === Diffusion models were introduced in 2015 as a method to train a model that can sample from a highly complex probability distribution. They used techniques from non-equilibrium thermodynamics, especially diffusion. Consider, for example, how one might model the distribution of all naturally occurring photos. Each image is a point in the space of all images, and the distribution of naturally occurring photos is a "cloud" in space, which, by repeatedly adding noise to the images, diffuses out to the rest of the image space, until the cloud becomes all but indistinguishable from a Gaussian distribution N(0,I){\mathcal {N}}(0,I)
. A model that can approximately undo the diffusion can then be used to sample from the original distribution. This is studied in "non-equilibrium" thermodynamics, as the starting distribution is not in equilibrium, unlike the final distribution. The equilibrium distribution is the Gaussian distribution N(0,I){\mathcal {N}}(0,I)
, with pdf ρ(x)e12x2\rho (x)\propto e^{-{\frac {1}{2}}\|x\|^{2}}
. This is just the Maxwell–Boltzmann distribution of particles in a potential well V(x)=12x2V(x)={\frac {1}{2}}\|x\|^{2}

=== Denoising Diffusion Probabilistic Model (DDPM) === The 2020 paper proposed the Denoising Diffusion Probabilistic Model (DDPM), which improves upon the previous method by variational inference.

==== Forward diffusion ==== To present the model, some notation is required. Σ\Sigma
, and xx
. A vertical bar denotes conditioning. A forward diffusion process starts at some starting point x0qx_{0}\sim q
, where xt=1βtxt1+βtztx_{t}={\sqrt {1-\beta _{t}}}x_{t-1}+{\sqrt {\beta _{t}}}z_{t}
where N(0,I){\mathcal {N}}(0,I)
. The coefficients \mboxVar(X0)=I{\mbox{Var}}(X_{0})=I
. The values of x0x_{0}
, if it has finite second moment, then N(0,I){\mathcal {N}}(0,I)
. The entire diffusion process then satisfies q(x0:T)=q(x0)q(x1x0)q(xTxT1)=q(x0)N(x1α1x0,β1I)N(xTαTxT1,βTI)q(x_{0:T})=q(x_{0})q(x_{1}|x_{0})\cdots q(x_{T}|x_{T-1})=q(x_{0}){\mathcal {N}}(x_{1}|{\sqrt {\alpha _{1}}}x_{0},\beta _{1}I)\cdots {\mathcal {N}}(x_{T}|{\sqrt {\alpha _{T}}}x_{T-1},\beta _{T}I)
or lnq(x0:T)=lnq(x0)t=1T12βtxt1βtxt12+C\ln q(x_{0:T})=\ln q(x_{0})-\sum _{t=1}^{T}{\frac {1}{2\beta _{t}}}\|x_{t}-{\sqrt {1-\beta _{t}}}x_{t-1}\|^{2}+C
where xt1xt,x0N(μ~t(xt,x0),σ~t2I)x_{t-1}|x_{t},x_{0}\sim {\mathcal {N}}({\tilde {\mu }}_{t}(x_{t},x_{0}),{\tilde {\sigma }}_{t}^{2}I)
In particular, notice that for large tt
, the variable

      x
      
        t
      
    
    
      |
    
    
      x
      
        0
      
    
    ∼
    N
    
      (
      
        
          
            
              
                
                  
                    α
                    ¯
                  
                
              
              
                t
              
            
          
        
        
          x
          
            0
          
        
        ,
        
          σ
          
            t
          
          
            2
          
        
        I...

(Article truncated for display)

Source

This content is sourced from Wikipedia, the free encyclopedia. Read full article on Wikipedia

Category

Computer Vision - Computer Science

Submission:12/25/2025
Comments:0 comments
Subjects:Computer Science; Computer Vision
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Diffusion model | Researchia