论文阅读的附录（四）：Diffusion policy: Visuomotor policy learning via action diffusion：Diffusion Policy 为什么更稳定

最新推荐文章于 2025-08-29 19:11:26 发布

原创

最新推荐文章于 2025-08-29 19:11:26 发布 · 822 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#论文阅读

Diffusion policy: Visuomotor policy learning via action diffusion：Diffusion Policy 为什么更稳定？

文章概括

文章概括

引用：

@article{
   
   chi2023diffusion,
  title={
   
   Diffusion policy: Visuomotor policy learning via action diffusion},
  author={
   
   Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran},
  journal={
   
   arXiv preprint arXiv:2303.04137},
  year={
   
   2023}
}

Chi, C., Feng, S., Du, Y., Xu, Z., Cousineau, E., Burchfiel, B. and Song, S., 2023. Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137.

原文：https://arxiv.org/abs/2303.04137
代码、数据和视频：https://diffusion-policy.cs.columbia.edu/

文章解析原文：
https://blog.csdn.net/xzs1210652636/article/details/142500842

1. 背景：EBM 里的归一化常数 $Z(\mathbf{o}, \theta)$

在基于能量的模型（EBM）中，我们有

$p_\theta(\mathbf{a}\mid \mathbf{o}) = \frac{\exp[-\,E_\theta(\mathbf{o}, \mathbf{a})]}{Z(\mathbf{o}, \theta)},$
其中

$E_\theta(\mathbf{o}, \mathbf{a})$ 是能量函数，数值越低表示 $\mathbf{a}$ 越可能是“好动作”；
$Z(\mathbf{o}, \theta)$ 是针对动作 $\mathbf{a}$ 的积分或求和，用于归一化，让 $p_\theta(\mathbf{a}\mid \mathbf{o})$ 成为合法的概率分布（各动作的概率和为 1）。