Generative Omnimatte: Learning to Decompose Video into Layers

Lee, Yao-Chih; Lu, Erika; Rumbley, Sarah; Geyer, Michal; Huang, Jia-Bin; Dekel, Tali; Cole, Forrester

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.16683 (cs)

[Submitted on 25 Nov 2024 (v1), last revised 24 Mar 2025 (this version, v2)]

Title:Generative Omnimatte: Learning to Decompose Video into Layers

Authors:Yao-Chih Lee, Erika Lu, Sarah Rumbley, Michal Geyer, Jia-Bin Huang, Tali Dekel, Forrester Cole

View PDF HTML (experimental)

Abstract:Given a video and a set of input object masks, an omnimatte method aims to decompose the video into semantically meaningful layers containing individual objects along with their associated effects, such as shadows and reflections. Existing omnimatte methods assume a static background or accurate pose and depth estimation and produce poor decompositions when these assumptions are violated. Furthermore, due to the lack of generative prior on natural videos, existing methods cannot complete dynamic occluded regions. We present a novel generative layered video decomposition framework to address the omnimatte problem. Our method does not assume a stationary scene or require camera pose or depth information and produces clean, complete layers, including convincing completions of occluded dynamic regions. Our core idea is to train a video diffusion model to identify and remove scene effects caused by a specific object. We show that this model can be finetuned from an existing video inpainting model with a small, carefully curated dataset, and demonstrate high-quality decompositions and editing results for a wide range of casually captured videos containing soft shadows, glossy reflections, splashing water, and more.

Comments:	CVPR 2025. Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.16683 [cs.CV]
	(or arXiv:2411.16683v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.16683

Submission history

From: Yao-Chih Lee [view email]
[v1] Mon, 25 Nov 2024 18:59:57 UTC (13,068 KB)
[v2] Mon, 24 Mar 2025 16:08:09 UTC (7,172 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Generative Omnimatte: Learning to Decompose Video into Layers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Generative Omnimatte: Learning to Decompose Video into Layers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators