Generating Physical Dynamics Under Priors(英文原文_正文)

GENERATING PHYSICAL DYNAMICS UNDER PRIORS

Zihan Zhou1, Xiaoxue Wang2, Tianshu Yu1,*
1School of Data Science, The Chinese University of Hong Kong
2ChemLex Technology Co., Ltd.
zihanzhou@link.cuhk.edu.hk, wxx@chemlex.tech, yutianshu@cuhk.edu.hk
*corresponding author

ABSTRACT

Generating physically feasible dynamics in a data-driven context is challenging, especially when adhering to physical priors expressed in specific equations or formulas. Existing methodologies often overlook the integration of “physical priors” resulting in violation of basic physical laws and suboptimal performance. In this paper, we introduce a novel framework that seamlessly incorporates physical priors into diffusion-based generative models to address the limitation. Our approach leverages two categories of priors:

  1. distributional priors, such as roto-translational invariance, and
  2. physical feasibility priors, including energy and momentum conservation laws and PDE constraints.

By embedding these priors into the generative process, our method can efficiently generate physically realistic dynamics, encompassing trajectories and flows. Empirical evaluations demonstrate that our method produces high-quality dynamics across a diverse array of physical phenomena with remarkable robustness, underscoring its potential to advance data-driven studies in AI4Physics. Our contributions signify a substantial advancement in the field of generative modeling, offering a robust solution to generate accurate and physically consistent dynamics.

1 Introduction

The generation of physically feasible dynamics is a fundamental challenge in the realm of data-driven modeling and AI4Physics. These dynamics, driven by Partial Differential Equations (PDEs), are ubiquitous in various scientific and engineering domains, including fluid dynamics (Kutz, 2017), climate modeling (Rasp et al., 2018), and materials science (Choudhury et al., 2022). Accurately generating such dynamics is crucial for advancing our understanding and predictive capabilities in these fields (Bazok & Ioannidis, 2019). Recently, generative models have revolutionized the study of physics by providing powerful tools to simulate and predict complex systems.

Generative vs. discriminative models.

Even when high-performing discriminative models for dynamics are available such as finite elements (Zhang et al., 2021; Uriarte et al., 2022), finite difference (Lu et al., 2021; Salman et al., 2022), finite volume (Ranade et al., 2021), or physics-informed neural networks (PINNs) (Raissi et al., 2019), generative models are crucial in machine learning for their ability to capture the full data distribution, enabling more effective data synthesis (de Oliveira et al., 2017), anomaly detection (M et al., 2017), and enhanced robustness and interpretability by modeling the joint distribution of data and labels, offering insights into unseen scenarios (Takeshita & Kalousis, 2021). Generative models are also pivotal in creative domains, such as drug discovery (Lavecchia, 2019), where they enable the creation of novel data samples.

Challenge.

However, the intrinsic complexity and high-dimensional nature of physical dynamics pose significant challenges for traditional learning systems. Recent advancements in generative modeling, particularly diffusion-based generative models (Song et al., 2020), have shown promise in capturing complex data distributions. These models iteratively refine noisy samples to match the target distribution, making them well-suited for high-dimensional data generation. Despite their success, existing approaches often overlook the incorporation of “physical priors” expressed in specific equations or formulas, which are essential for ensuring that the generated dynamics adhere to fundamental physical laws.

Solution.

In this work, we propose a novel framework that integrates priors into diffusion-based generative models to generate physically feasible dynamics. Our approach leverages two types of priors: Distributional priors, including roto-translational invariance and equivariance, ensure that models capture the intrinsic properties of the data rather than their specific representations; Physical feasibility priors, including energy and momentum conservation laws and PDE constraints, enforce the adherence to fundamental physical principles, thus improving the quality of generated dynamics.

The integration of priors into the generative process is a complex task that necessitates a deep understanding of the relevant mathematical and physical principles. Unlike prior works, where the physical system is modeled as a foreground-truth value x t \boldsymbol{x}_t xt, diffusion generative models aim to characterize a full ground-truth distribution ∇ x log ⁡ p x ( x t ) \nabla_x \log p_x(\boldsymbol{x}_t) xlogpx(xt) or E [ x t ] ⋅ x t \mathbb{E}[\boldsymbol{x}_t] \cdot \boldsymbol{x}_t E[xt]xt (notation in Equation 1). This fundamental difference complicates the direct application of priors based on ground-truth values to the output of generative models. In this work, we propose a framework to address this challenge by effectively embedding priors within the generative model’s output distribution. By incorporating these priors, we can effectively embed the physical laws of the system, which can efficiently produce physically plausible dynamics. This capability is particularly useful for studying physical phenomena where the governing equations are too complex to be learned purely from data.

Figure 1: Animated visualization of generated samples of shallow water dynamics, showcasing the variations over time. Use the latest version of Adobe Acrobat Reader to view.

Figure 1: Animated visualization of generated samples of shallow water dynamics, showcasing the variations over time. Use the latest version of Adobe Acrobat Reader to view.

Results.

Empirical evaluations of our method demonstrate its effectiveness in producing high-quality dynamics across a range of physical phenomena. Our approach exhibits high robustness and generalizability, making it a promising tool for the data-driven study of AI4Physics. In Fig. 1, we provide a generated sample of the shallow water dataset (Martinez-Aranda et al., 2018). The generated dynamics not only capture the intricate details of the physical processes but also adhere to the fundamental physical laws, offering an accurate and reliable representation of underlying systems.

Contribution.

In conclusion, our work presents a significant advancement in the field of data-driven generative modeling by introducing a novel framework that integrates physical priors into diffusion-based generative models. In all, our method

  1. improves the feasibility of generated dynamics, making them more aligned with physical principles compared to baseline methods;
  2. poses the solution to the longstanding challenge of generating physically feasible dynamics;
  3. paves the way for more accurate and reliable data-driven studies in various scientific and engineering domains, highlighting the potential of AI4Physics in advancing our understanding of complex physical systems.

2 Preliminaries

In Appendix A, we present a comprehensive review of Related Work, specifically focusing on three key areas: generative methods for physics, score-based diffusion models, and physics-informed neural networks. This section aims to provide foundational knowledge for readers who may not be familiar with these topics. We recommend that those seeking to deepen their understanding of these areas consult this appendix.

2.1 Diffusion Models

Diffusion models generate samples following an underlying distribution. Consider a random variable x 0 ∈ R n \boldsymbol{x}_0 \in \mathbb{R}^n x0Rn drawn from an unknown distribution q 0 q_0 q0. Denoising diffusion probabilistic models (Song & Ermon, 2019; Song et al., 2020; Ho et al., 2020) describe a forward process x t , t ∈ [ 0 , T ] \boldsymbol{x}_t, t \in [0, T] xt,t[0,T] governed by an Ito stochastic differential equation (SDE)

d x t = f ( t ∣ x t ) d t + g ( t ∣ x t ) d w t , x 0 ∼ q 0 , f ( t ) = d log ⁡ α t d t , g 2 ( t ) = d σ t 2 d t − 2 d log ⁡ α t d t σ t 2 . ( 1 ) \mathrm{d} \boldsymbol{x}_t = f(t \mid \boldsymbol{x}_t) \mathrm{d} t + g(t \mid \boldsymbol{x}_t) \mathrm{d} \mathbf{w}_t, \quad \boldsymbol{x}_0 \sim q_0, \quad f(t) = \frac{\mathrm{d} \log \alpha_t}{\mathrm{d} t}, \quad g^2(t) = \frac{\mathrm{d} \sigma_t^2}{\mathrm{d} t} - 2 \frac{\mathrm{d} \log \alpha_t}{\mathrm{d} t} \sigma_t^2. (1) dxt=f(txt)dt+g(txt)dwt,x0q0,f(t)=dtdlogαt,g2(t)=dtdσt22dtdlogαtσt2.(1)

where w t ∈ R n \mathbf{w}_t \in \mathbb{R}^n wtRn denotes the standard Brownian motion, and α t \alpha_t αt and σ t \sigma_t σt are predetermined functions of t t t. This forward process has a closed-form solution of q t ( x t ∣ x 0 ) = N ( x t ∣ α t x 0 , σ t 2 I ) q_t(\boldsymbol{x}_t \mid \boldsymbol{x}_0) = \mathcal{N}(\boldsymbol{x}_t \mid \alpha_t \boldsymbol{x}_0, \sigma_t^2 \mathbf{I}) qt(xtx0)=N(xtαtx0,σt2I) and has a corresponding reverse process of the probability flow ordinary differential equation (ODE), running from time T T T to 0, defined as (Song et al., 2020)

d x t d t = f ( t ∣ x t ) − 1 2 σ 2 ( t ) ∇ x log ⁡ q t ( x t ) , x T ∼ q T ( x T ∣ x 0 ) ≈ q T ( x T ) . ( 2 ) \frac{\mathrm{d} \boldsymbol{x}_t}{\mathrm{d} t} = f(t \mid \boldsymbol{x}_t) - \frac{1}{2} \sigma^2(t) \nabla_x \log q_t(\boldsymbol{x}_t), \quad \boldsymbol{x}_T \sim q_T(\boldsymbol{x}_T \mid \boldsymbol{x}_0) \approx q_T(\boldsymbol{x}_T). (2) dtdxt=f(txt)21σ2(t)xlogqt(xt),xTqT(xTx0)qT(xT).(2)

The marginal probability density { q t ( x t ) } t = 0 T \{q_t(\boldsymbol{x}_t)\}_{t=0}^T {qt(xt)}t=0T of the forward SDE aligns with the reverse ODE (Song et al., 2020). This indicates that if we can sample from q T ( x T ) q_T(\boldsymbol{x}_T) qT(xT) and solve Equation 2, then the resulting x 0 \boldsymbol{x}_0 x0 will follow the distribution q 0 q_0 q0. By choosing α t → 0 \alpha_t \to 0 αt0 and σ t → 1 \sigma_t \to 1 σt1, the distribution q T ( x T ) q_T(\boldsymbol{x}_T) qT(xT) can be approximated as a normal distribution. The score ∇ x log ⁡ q t ( x t ) \nabla_x \log q_t(\boldsymbol{x}_t) xlogqt(xt) can be approximated by a deep learning model. The quality of the generated samples is contingent upon the model’s ability to accurately approximate the score functions (Kwon et al., 2022; Gao & Zhu, 2024). A more precise approximation results in a distribution that closely aligns with the distribution of the training data. This is a key advantage of using deep generative distributions as physical feasibility into the models is advisable. Section 3 will elaborate on our methods for integrating distributional priors and physical feasibility priors, as well as the objectives for score matching.

2.2 Invariant Distributions

An invariant distribution refers to a probability distribution that remains unchanged under the action of a specified group of transformations. These transformations can include operations such as translations, rotations, or other symmetries, depending on the problem domain. Formally, let G \mathcal{G} G be a group of transformations. A distribution q q q is said to be G \mathcal{G} G-invariant under the group G \mathcal{G} G if, for all transformations G ∈ G G \in \mathcal{G} GG, we have ⟨ G ( x ) , ∇ x q ( x ) ⟩ = 0 \langle G(\boldsymbol{x}), \nabla_x q(\boldsymbol{x}) \rangle = 0 G(x),xq(x)⟩=0. Invariance under group transformations is particularly significant in modeling distributions that exhibit symmetries. For instance, in the case of 3D coordinates, invariance under rigid transformations—such as translations and rotations (SE(3) group)—is essential for spatial understanding (Zhou et al., 2024). Equivariant models are usually required to embed invariance. A function (or model) f : R n → R n f: \mathbb{R}^n \to \mathbb{R}^n f:RnRn is said to be ( G , L ) (\mathcal{G}, \mathcal{L}) (G,L)-equivariant where G \mathcal{G} G is the group actions and L \mathcal{L} L is a function operator, if for any G ∈ G G \in \mathcal{G} GG, f ( G ( x ) ) = L ( G ( f ( x ) ) ) f(G(\boldsymbol{x})) = \mathcal{L}(G(f(\boldsymbol{x}))) f(G(x))=L(G(f(x))).

3 Method

In this study, we aim to investigate methodologies for enhancing the capability of diffusion models to approximate the targeted score functions. We have two primary objectives:

  1. To incorporate distributional priors, such as translational and rotational invariance, which will aid in selecting the appropriate model for training objective functions;
  2. To impose physical feasibility priors on the diffusion model, necessitating injection of priors to the model’s output of a distribution related to the ground-truth samples (specifically, ∇ x log ⁡ q ( x t ) \nabla_x \log q(\boldsymbol{x}_t) xlogq(xt) or E [ x 0 ∣ x t ] \mathbb{E}[\boldsymbol{x}_0 \mid \boldsymbol{x}_t] E[x0xt]).

In this section, we consider the forward diffusion process given by Equation 1, where x t = ϵ t x 0 + ϵ t e t \boldsymbol{x}_t = \epsilon_t \boldsymbol{x}_0 + \epsilon_t \boldsymbol{e}_t xt=ϵtx0+ϵtet with e t ∼ N ( 0 , I ) \boldsymbol{e}_t \sim \mathcal{N}(0, \mathbf{I}) etN(0,I).

3.1 Incorporating Distributional Priors

In this section, we study the score function ∇ x log ⁡ q t ( x t ) \nabla_x \log q_t(\boldsymbol{x}_t) xlogqt(xt) for G \mathcal{G} G-invariant distributions. Understanding its corresponding properties can guide the selection of models with the desired equivariance, facilitating sampling from the G \mathcal{G} G-invariant distribution. In the following, we will assume that the sufficient conditions of Theorem 1 hold so that the marginal distributions q t q_t qt are G \mathcal{G} G-invariant. The definitions of the terminologies and proof of the theorem can be found in Appendix F.1.

Theorem 1 (Sufficient conditions for the invariance of q t q_t qt to imply the invariance of q 0 q_0 q0): Let q 0 q_0 q0 be a G \mathcal{G} G-invariant distribution. If for all G ∈ G \mathbf{G} \in \mathcal{G} GG, G \mathbf{G} G is a volume-preserving diffeomorphism and isometry, and for all 0 < ϵ < 1 0 < \epsilon < 1 0<ϵ<1, there exists H ∈ G \mathbf{H} \in \mathcal{G} HG such that H ( d x ) = ϵ G ( x ) \mathbf{H}(\mathrm{d} \boldsymbol{x}) = \epsilon \mathbf{G}(\boldsymbol{x}) H(dx)=ϵG(x), then q t q_t qt is also G \mathcal{G} G-invariant.

Property of score functions. Let q t q_t qt be a G \mathcal{G} G-invariant distribution. By the chain rule, we have ∇ x log ⁡ q t ( x t ) = ∇ x log ⁡ q t ( G ( x t ) ) = ∂ G ( x t ) ∂ x ∇ x log ⁡ q t ( G ( x t ) ) \nabla_x \log q_t(\boldsymbol{x}_t) = \nabla_x \log q_t(G(\boldsymbol{x}_t)) = \frac{\partial G(\boldsymbol{x}_t)}{\partial \boldsymbol{x}} \nabla_x \log q_t(G(\boldsymbol{x}_t)) xlogqt(xt)=xlogqt(G(xt))=xG(xt)xlogqt(G(xt)), for all G ∈ G \mathbf{G} \in \mathcal{G} GG. Hence,

∇ G ( x t ) log ⁡ q t ( G ( x t ) ) = ( ∂ G ( x t ) ∂ x ) − 1 ∇ x log ⁡ q t ( x t ) . ( 3 ) \nabla_{\mathbf{G}(\boldsymbol{x}_t)} \log q_t(\mathbf{G}(\boldsymbol{x}_t)) = \left( \frac{\partial \mathbf{G}(\boldsymbol{x}_t)}{\partial \boldsymbol{x}} \right)^{-1} \nabla_x \log q_t(\boldsymbol{x}_t). (3) G(xt)logqt(G(xt))=(xG(xt))1xlogqt(xt).(3)

This implies that the score function of a G \mathcal{G} G-invariant distribution is ( G , ∇ − 1 ) (\mathcal{G}, \nabla^{-1}) (G,1)-equivariant. We should use a ( G , ∇ − 1 ) (\mathcal{G}, \nabla^{-1}) (G,1)-equivariant model to predict the score function. The loss objective is given by

J score ( θ ) = E x ∼ N ∞ [ w ( t ) ∥ s θ ( x t , t ) − ∇ x log ⁡ q t ( x t ) ∥ 2 ] , ( 4 ) \mathcal{J}_{\text{score}}(\boldsymbol{\theta}) = \mathbb{E}_{\boldsymbol{x} \sim \mathcal{N}_{\infty}} \left[ w(t) \left\| \mathbf{s}_{\boldsymbol{\theta}}(\boldsymbol{x}_t, t) - \nabla_x \log q_t(\boldsymbol{x}_t) \right\|^2 \right], (4) Jscore(θ)=ExN[w(t)sθ(xt,t)xlogqt(xt)2],(4)

where w ( t ) w(t) w(t) is a positive weight function and s θ \mathbf{s}_{\boldsymbol{\theta}} sθ is a ( G , ∇ − 1 ) (\mathcal{G}, \nabla^{-1}) (G,1)-equivariant model. We will discuss the handling of the intractable score function ∇ x log ⁡ q t ( x t ) \nabla_x \log q_t(\boldsymbol{x}_t) xlogqt(xt) subsequently in Equation 6.

In the context of simulating physical dynamics, two distributional priors are commonly considered: S E ( n ) \mathrm{SE}(n) SE(n) invariance and permutation invariance. They ensure that the learned representations are consistent with the fundamental symmetries of physical laws, including rigid body transformations and indistinguishability of particles, thereby enhancing the model’s ability to generalize across different physical scenarios. The derivations for the following examples can be found in Appendix F.2.

Example 1. (SE(n)-invariant distribution): If q 0 q_0 q0 is an S E ( n ) \mathrm{SE}(n) SE(n)-invariant distribution, then q t q_t qt is also S E ( n ) \mathrm{SE}(n) SE(n)-invariant. The score function of an S E ( n ) \mathrm{SE}(n) SE(n)-invariant distribution is S O ( n ) \mathrm{SO}(n) SO(n)-equivariant and translation-invariant.

Example 2. (Permutation-invariant distribution): If q 0 q_0 q0 is a permutation-invariant, then q t q_t qt is also permutation-invariant. The score function of a permutation-invariant distribution is permutation-equivariant.

In the following, we will show that using such a ( G , ∇ − 1 ) (\mathcal{G}, \nabla^{-1}) (G,1)-equivariant model, we are essentially training a model that focuses on the intrinsic structure of data instead of their representation form.

Equivalence class manifold for invariant distributions. An equivalence class manifold (ECM) refers to the minimum subset of samples that are all considered equivalent in the sense that samples are indistinguishable (formal). For example, in the n n n-dimensional space, coordinates that have undergone rotation and translation maintain their pairwise distances, which allows the use of a set of coordinates to represent all other coordinates with the same distance matrices, thereby forming an equivalence class manifold (see Appendix B for the formal definition and examples). By incorporating the invariance prior to the training set, we can construct ECM from the training set or a mini-batch of samples. The utilization of ECM enables the models to concentrate on the intrinsic structure of the data, thereby enhancing generalization and robustness to irrelevant variations. We assume that the distribution of x \boldsymbol{x} x follows a G \mathcal{G} G-invariant distribution q t q_t qt. Let φ t \varphi_t φt map x t \boldsymbol{x}_t xt to the corresponding point having the same intrinsic structure in ECM. Then there exists G ∈ G \mathbf{G} \in \mathcal{G} GG such that G ( φ t ( x t ) ) = x t \mathbf{G}(\varphi_t(\boldsymbol{x}_t)) = \boldsymbol{x}_t G(φt(xt))=xt. Since φ t \varphi_t φt is i n c h G \mathcal inch \mathcal{G} inchG-invariant, we have φ t ( x t ) = φ t ( G ( φ t ( x t ) ) ) \varphi_t(\boldsymbol{x}_t) = \varphi_t(\mathbf{G}(\varphi_t(\boldsymbol{x}_t))) φt(xt)=φt(G(φt(xt))). Informally, G ( G ) \mathbf{G}(\mathbf{G}) G(G) where ∂ G x ( x t ) \partial_{\mathbf{G} \boldsymbol{x}}(\boldsymbol{x}_t) Gx(xt) is defined on the domain of ECM. Taking the logarithm and derivative, we have ∇ G x ( x t ) log ⁡ q t ( x t ) = ∇ φ t ( x t ) log ⁡ q φ t ( φ t ( x t ) ) \nabla_{\mathbf{G} \boldsymbol{x}}(\boldsymbol{x}_t) \log q_t(\boldsymbol{x}_t) = \nabla_{\varphi_t(\boldsymbol{x}_t)} \log q_{\varphi_t}(\varphi_t(\boldsymbol{x}_t)) Gx(xt)logqt(xt)=φt(xt)logqφt(φt(xt)). Note that ∇ G x log ⁡ q t ( x t ) = ∂ G ( x t ) ∂ x ∇ φ t ( x t ) log ⁡ q t ( x t ) \nabla_{\mathbf{G} \boldsymbol{x}} \log q_t(\boldsymbol{x}_t) = \frac{\partial \mathbf{G}(\boldsymbol{x}_t)}{\partial \boldsymbol{x}} \nabla_{\varphi_t(\boldsymbol{x}_t)} \log q_t(\boldsymbol{x}_t) Gxlogqt(xt)=xG(xt)φt(xt)logqt(xt). Hence,

∇ x log ⁡ q t ( x t ) = ∂ G ( x t ) ∂ x ∂ x ∇ φ t ( x t ) log ⁡ q φ t ( φ t ( x t ) ) . ( 5 ) \nabla_{\boldsymbol{x}} \log q_t(\boldsymbol{x}_t) = \frac{\frac{\partial \mathbf{G}(\boldsymbol{x}_t)}{\partial \boldsymbol{x}}}{\partial \boldsymbol{x}} \nabla_{\varphi_t(\boldsymbol{x}_t)} \log q_{\varphi_t}(\varphi_t(\boldsymbol{x}_t)). (5) xlogqt(xt)=xxG(xt)φt(xt)logqφt(φt(xt)).(5)

This implies that the score function of the G \mathcal{G} G-invariant distribution is closely related to the score function of the distribution in ECM. Such a result indicates that if we have a ( G , ∇ − 1 ) (\mathcal{G}, \nabla^{-1}) (G,1)-equivariant model that can predict the score functions in ECM, then this model predicts the score functions for all other points closed under the group operation. We summarize this result in the following theorem whose proofs can be found in Appendix F.3.

Theorem 2 (Equivalence class manifold representation): If we have a ( G , ∇ − 1 ) (\mathcal{G}, \nabla^{-1}) (G,1)-equivariant model such that s θ ( x t , t ) = ∇ x log ⁡ q φ t ( x t ) \mathbf{s}_{\boldsymbol{\theta}}(\boldsymbol{x}_t, t) = \nabla_x \log q_{\varphi_t}(\boldsymbol{x}_t) sθ(xt,t)=xlogqφt(xt) almost surely on x t ∈ E C M \boldsymbol{x}_t \in \mathrm{ECM} xtECM, then we have s θ ( x t , t ) = ∇ x log ⁡ q t ( x t ) \mathbf{s}_{\boldsymbol{\theta}}(\boldsymbol{x}_t, t) = \nabla_{\boldsymbol{x}} \log q_t(\boldsymbol{x}_t) sθ(xt,t)=xlogqt(xt) almost surely.

Objective for fitting the score function. The score function ∇ x log ⁡ q t ( x t ) \nabla_x \log q_t(\boldsymbol{x}_t) xlogqt(xt) is generally intractable, and we consider the objective for noise matching and data matching (Vincent, 2011; Song et al., 2020; Zheng et al., 2023), where objectives and optimal values are given by

J noise ( θ ) = E x ∼ N ∞ [ w ( t ) ∥ s θ ( x t , t ) − x ∥ 2 ] , s θ ∗ ( x t , t ) = − σ t ∇ x log ⁡ q t ( x t ) ; ( 6 a ) J data ( θ ) = E x ∼ N ∞ [ w ( t ) ∥ z θ ( x t , t ) − x ∥ 2 ] , z θ ∗ ( x t , t ) = 1 σ t x t + σ t 2 σ t ∇ x log ⁡ q t ( x t ) . ( 6 b ) \begin{aligned} \mathcal{J}_{\text{noise}}(\boldsymbol{\theta}) &= \mathbb{E}_{\boldsymbol{x} \sim \mathcal{N}_{\infty}} \left[ w(t) \left\| \mathbf{s}_{\boldsymbol{\theta}}(\boldsymbol{x}_t, t) - \boldsymbol{x} \right\|^2 \right], \quad \mathbf{s}_{\boldsymbol{\theta}}^*(\boldsymbol{x}_t, t) = -\sigma_t \nabla_x \log q_t(\boldsymbol{x}_t); (6a) \\ \mathcal{J}_{\text{data}}(\boldsymbol{\theta}) &= \mathbb{E}_{\boldsymbol{x} \sim \mathcal{N}_{\infty}} \left[ w(t) \left\| \mathbf{z}_{\boldsymbol{\theta}}(\boldsymbol{x}_t, t) - \boldsymbol{x} \right\|^2 \right], \quad \mathbf{z}_{\boldsymbol{\theta}}^*(\boldsymbol{x}_t, t) = \frac{1}{\sigma_t} \boldsymbol{x}_t + \frac{\sigma_t^2}{\sigma_t} \nabla_x \log q_t(\boldsymbol{x}_t). (6b) \end{aligned} Jnoise(θ)Jdata(θ)=ExN[w(t)sθ(xt,t)x2],sθ(xt,t)=σtxlogqt(xt);(6a)=ExN[w(t)zθ(xt,t)x2],zθ(xt,t)=σt1xt+σtσt2xlogqt(xt).(6b)

The diffusion objectives for both the noise predictor s θ \mathbf{s}_{\boldsymbol{\theta}} sθ and the data predictor z θ \mathbf{z}_{\boldsymbol{\theta}} zθ are intrinsically linked to the score function, thereby inheriting its characteristics and properties. However, the data predictor incorporates a term, 1 σ t x t \frac{1}{\sigma_t} \boldsymbol{x}_t σt1xt, whose numerical range exhibits instability. This instability compromises the predictor’s ability to inherit the straightforward properties of the score function. Therefore, to incorporate G \mathcal{G} G-invariance, it is advisable to employ noise matching, which is given by Equation 6a and s θ \mathbf{s}_{\boldsymbol{\theta}} sθ is ( G , ∇ − 1 ) (\mathcal{G}, \nabla^{-1}) (G,1)-equivariant, which is the property of the score function.

A specific instance of a distributional prior is defined by samples that conform to the constraints imposed by PDEs. In this context, the dynamics at any given spatial location depend solely on the characteristics of the system within its local vicinity, rather than on absolute spatial coordinates. Under these conditions, it is appropriate to employ translation-invariant models for both noise matching and data matching. Nevertheless, the samples in question exhibit significant smoothness. As a result, utilizing the noise matching objective necessitates that the model’s output be accurate at every individual pixel. In contrast, applying the data matching objective only requires the model to produce smooth output values. Therefore, it is recommended to adopt the data matching objective for this purpose. The selection between data matching and noise matching plays a critical role in determining the quality of the generated samples. For detailed experimental results, refer to Sec. 4.3.

Remark 1. In this section, we primarily explore the principle for incorporating distributional priors by selecting models with particular characteristics. Specifically:

  1. When the distribution exhibits G \mathcal{G} G-invariance, a ( G , ∇ − 1 ) (\mathcal{G}, \nabla^{-1}) (G,1)-equivariant model should be employed alongside the noise matching objective (Equation 6a).
  2. For samples that are subject to PDE constraints and exhibit high smoothness, the data matching objective (Equation 6b) is recommended.

3.2 Incorporating Physical Feasibility Priors

In this section, we explore how to incorporate physical feasibility priors such as physics laws and explicit PDE constraints into noise and data matching objectives in diffusion models. By Ito’s formula (Oksendal, 2011; Kim & Ye, 2021; Chung et al., 2022), we have E [ x 0 ∣ x t ] = x t + σ t 2 ∇ x log ⁡ q t ( x t ) \mathbb{E}[\boldsymbol{x}_0 \mid \boldsymbol{x}_t] = \boldsymbol{x}_t + \sigma_t^2 \nabla_x \log q_t(\boldsymbol{x}_t) E[x0xt]=xt+σt2xlogqt(xt). Thus, the physical feasibility priors can be imposed by ensuring that the output of the diffusion model, s θ ( x t , t ) \mathbf{s}_{\boldsymbol{\theta}}(\boldsymbol{x}_t, t) sθ(xt,t) or z θ ( x t , t ) \mathbf{z}_{\boldsymbol{\theta}}(\boldsymbol{x}_t, t) zθ(xt,t), satisfies the physical constraints, such as conservation laws or PDEs.

Theorem 3 (Multilinear Jensen’s gap). The optimizer for E x ∼ N ∞ [ w ( t ) ∥ z θ ( x t , t ) − x ∥ 2 ] \mathbb{E}_{x \sim \mathcal{N}_{\infty}} \left[ w(t) \left\| \mathbf{z}_{\boldsymbol{\theta}}(\boldsymbol{x}_t, t) - \boldsymbol{x} \right\|^2 \right] ExN[w(t)zθ(xt,t)x2] satisfies the physical constraints if the ground-truth data x \boldsymbol{x} x adheres to the physical laws, provided that the model z θ \mathbf{z}_{\boldsymbol{\theta}} zθ is expressive enough to capture the data distribution and the constraints are multilinear in nature.

This theorem implies that by optimizing the data matching objective, we can enforce physical feasibility priors implicitly through the data distribution, provided the constraints are multilinear (e.g., conservation of energy and momentum). For non-multilinear constraints, such as those arising from complex PDEs, explicit incorporation is required.

Explicit PDE Constraints. For PDE-constrained dynamics, we modify the loss function to include a penalty term that enforces the PDE constraints. Let L PDE ( x ) \mathcal{L}_{\text{PDE}}(\boldsymbol{x}) LPDE(x) represent the PDE residual, which measures the deviation of the generated samples from the governing PDE. The modified loss function is:

J total ( θ ) = J data ( θ ) + λ E x ∼ N ∞ [ L PDE ( z θ ( x t , t ) ) ] , \mathcal{J}_{\text{total}}(\boldsymbol{\theta}) = \mathcal{J}_{\text{data}}(\boldsymbol{\theta}) + \lambda \mathbb{E}_{\boldsymbol{x} \sim \mathcal{N}_{\infty}} \left[ \mathcal{L}_{\text{PDE}}(\mathbf{z}_{\boldsymbol{\theta}}(\boldsymbol{x}_t, t)) \right], Jtotal(θ)=Jdata(θ)+λExN[LPDE(zθ(xt,t))],

where λ \lambda λ is a hyperparameter controlling the strength of the PDE constraint. This approach ensures that the generated samples not only match the data distribution but also adhere to the PDE constraints.

4 Experiments

In this section, we evaluate the performance of our proposed framework on various datasets, including PDE datasets and particle dynamics datasets. We provide a detailed account of the selection of backbones and the training strategies employed for each dataset. We also provide ablation studies in Sec. 4.3 of 1) data matching and noise matching techniques for different datasets, revealing that incorporating a distributional prior enhances model performance; 2) the effect of Jensen’s gap, finding that individual constraints, if not properly handled, lead to underperformance. However, appropriately managing these priors using our proposed methods can lead to significant performance improvements.

4.1 PDE Datasets

PDE datasets, including advection (Zang, 1991), Darcy flow (Li et al., 2022), Burgers (Rudy et al., 2017), and Navier-Stokes (Li et al., 2020), represent a diverse set of physical phenomena. These datasets enable the simulation of complex systems, demonstrating the capability of models for broader application across a wide range of PDE datasets. Through this, they facilitate advances in understanding diverse natural and engineered processes.

Experiment settings. The PDE constraints for the above datasets are given by:

  • Advection: ∂ u ∂ t + u ⋅ ∇ u = 0 \frac{\partial u}{\partial t} + u \cdot \nabla u = 0 tu+uu=0
  • Darcy flow: − ∇ ⋅ ( a ( x ) ∇ u ) = f ( x ) -\nabla \cdot (a(x) \nabla u) = f(x) (a(x)u)=f(x)
  • Burgers: ∂ u ∂ t + u ∂ u ∂ x = ν ∂ 2 u ∂ x 2 \frac{\partial u}{\partial t} + u \frac{\partial u}{\partial x} = \nu \frac{\partial^2 u}{\partial x^2} tu+uxu=νx22u
  • Navier-Stokes: ∂ u ∂ t + ( u ⋅ ∇ ) u = − 1 ρ ∇ p + ν ∇ 2 u \frac{\partial \mathbf{u}}{\partial t} + (\mathbf{u} \cdot \nabla) \mathbf{u} = -\frac{1}{\rho} \nabla p + \nu \nabla^2 \mathbf{u} tu+(u)u=ρ1p+ν2u

We train our diffusion models with the modified loss function (Equation 7) to incorporate these PDE constraints. The backbone architecture is based on a U-Net (Ronneberger et al., 2015) with group-equivariant convolutions to enforce S E ( n ) \mathrm{SE}(n) SE(n) invariance. The training data consists of simulated solutions to these PDEs, and we evaluate the generated samples based on their adherence to the PDE constraints and their visual fidelity to the ground-truth dynamics.

4.2 Particle Dynamics Datasets

We train diffusion models to simulate the dynamics of chaotic three-body systems in 3D (Zhou & Yu, 2023) and five spring systems in 2D (Kuramoto, 1975; Kipt et al., 2018) (see Appendix D.1 for visualizations of datasets). In the case of the three-body system, we unconditionally generate the positions and velocities of three particles, where gravitational interactions govern their dynamics. The stochastic nature of this dataset arises from the random distribution of the initial positions and velocities. In five spring systems, each pair of particles has a probability 50 % 50\% 50% of being connected by a spring. The movements of the particles are influenced by the spring forces, which cause stretching or compression interactions. We conditionally generate the positions and velocities of the five particles based on their spring connectivity.

Notations. The features of the datasets are represented as x ∈ R N × D \boldsymbol{x} \in \mathbb{R}^{N \times D} xRN×D, where N N N is the number of particles, and D D D is the dimensionality of the position and velocity vectors (e.g., D = 6 D=6 D=6 for 3D position and velocity). The physical constraints include conservation of total momentum and total energy, which are enforced through the data matching objective.

Figure 2: Visualization of generated samples from the three-body (first row) and five spring (second row) datasets. The leftmost figures in each row represent methods without priors, the middle figures correspond to our proposed methods, the rightmost figures illustrate the physical properties as they evolve over time. Both total momentum and total energy should remain conserved. The samples generated by our methods demonstrate improved adherence to these conservation laws compared to baseline methods.

4.3 Ablation Studies

We conduct ablation studies to evaluate the impact of incorporating distributional and physical feasibility priors. We compare the following configurations:

  1. Baseline (No Priors): Diffusion models trained without any distributional or physical priors.
  2. Distributional Priors Only: Models trained with S E ( n ) \mathrm{SE}(n) SE(n) and permutation invariance but without explicit physical constraints.
  3. Physical Priors Only: Models trained with physical constraints (e.g., PDE residuals or conservation laws) but without distributional priors.
  4. Proposed Method: Models trained with both distributional and physical priors, using the modified loss function (Equation 7).

Results. Table 3 summarizes the sample quality for the three-body and five spring datasets. For both datasets, we simulate the ground-truth future motion based on the current states of the generated samples and report the Mean Squared Error (MSE) between the ground-truth motion and the generated ones. We also calculate the error in physical feasibility, such as conservation of energy and momentum, which should remain unchanged along the evolution of the systems.

Table 3: Sample Quality of the Three-Body and Five Spring Datasets

MethodThree-Body MSEThree-Body Energy ErrorThree-Body Momentum ErrorFive Spring MSEFive Spring Energy ErrorFive Spring Momentum Error
Baseline (No Priors)0.0520.1340.1280.0670.1450.139
Distributional Priors Only0.0410.0920.0870.0530.1010.095
Physical Priors Only0.0380.0450.0420.0490.0480.046
Proposed Method0.0290.0210.0190.0350.0220.020

The results demonstrate that incorporating both distributional and physical priors significantly improves the quality of generated samples. The proposed method achieves the lowest MSE and errors in energy and momentum conservation, indicating that the generated dynamics are both visually accurate and physically consistent.

5 Conclusion

In this work, we introduced a novel framework for generating physically feasible dynamics by integrating distributional and physical feasibility priors into diffusion-based generative models. Our approach addresses the limitations of existing methods by ensuring that generated dynamics adhere to fundamental physical laws, such as conservation of energy and momentum, and PDE constraints. Empirical evaluations on PDE and particle dynamics datasets demonstrate the effectiveness of our method in producing high-quality, physically consistent dynamics. Our contributions include:

  1. A robust framework for embedding physical priors into diffusion models, improving the feasibility of generated dynamics.
  2. A solution to the challenge of generating physically plausible dynamics in data-driven settings.
  3. Demonstrations of improved performance across diverse physical phenomena, highlighting the potential of our approach for AI4Physics applications.

Future work will explore the scalability of our framework to larger and more complex systems, as well as the integration of additional physical priors, such as thermodynamic constraints, to further enhance the applicability of our method.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值