Diffusion policy: Visuomotor policy learning via action diffusion:Diffusion Policy 为什么更稳定?
文章概括
引用:
@article{
chi2023diffusion,
title={
Diffusion policy: Visuomotor policy learning via action diffusion},
author={
Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran},
journal={
arXiv preprint arXiv:2303.04137},
year={
2023}
}
Chi, C., Feng, S., Du, Y., Xu, Z., Cousineau, E., Burchfiel, B. and Song, S., 2023. Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137.
原文:https://arxiv.org/abs/2303.04137
代码、数据和视频:https://diffusion-policy.cs.columbia.edu/
文章解析原文:
https://blog.csdn.net/xzs1210652636/article/details/142500842
1. 背景:EBM 里的归一化常数 Z ( o , θ ) Z(\mathbf{o}, \theta) Z(o,θ)
在基于能量的模型(EBM)中,我们有
p θ ( a ∣ o ) = exp [ − E θ ( o , a ) ] Z ( o , θ ) , p_\theta(\mathbf{a}\mid \mathbf{o}) = \frac{\exp[-\,E_\theta(\mathbf{o}, \mathbf{a})]}{Z(\mathbf{o}, \theta)}, pθ(a∣o)=Z(o,θ)exp[−Eθ(o,a)],
其中
- E θ ( o , a ) E_\theta(\mathbf{o}, \mathbf{a}) Eθ(o,a) 是能量函数,数值越低表示 a \mathbf{a} a越可能是“好动作”;
- Z ( o , θ ) Z(\mathbf{o}, \theta) Z(o,θ) 是针对动作 a \mathbf{a} a 的积分或求和,用于归一化,让 p θ ( a ∣ o ) p_\theta(\mathbf{a}\mid \mathbf{o}) pθ(a∣o) 成为合法的概率分布(各动作的概率和为 1)。
难点: Z ( o , θ ) Z(\mathbf{o}, \theta) Z(o,θ)