LaMD: Latent Motion Diffusion for Image-Conditional Video Generation

被引：0

作者：

Hu, Yaosi ^{[1
]}

Chen, Zhenzhong ^{[1
]}

Luo, Chong ^{[2
]}

机构：

[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2025年

基金：

中国国家自然科学基金;

关键词：

Video generation; Video prediction; Diffusion model; Motion generation;

D O I：

10.1007/s11263-025-02386-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The video generation field has witnessed rapid improvements with the introduction of recent diffusion models. While these models have successfully enhanced appearance quality, they still face challenges in generating coherent and natural movements while efficiently sampling videos. In this paper, we propose to condense video generation into a problem of motion generation, to improve the expressiveness of motion and make video generation more manageable. This can be achieved by breaking down the video generation process into latent motion generation and video reconstruction. Specifically, we present a latent motion diffusion (LaMD) framework, which consists of a motion-decomposed video autoencoder and a diffusion-based motion generator, to implement this idea. Through careful design, the motion-decomposed video autoencoder can compress patterns in movement into a concise latent motion representation. Consequently, the diffusion-based motion generator is able to efficiently generate realistic motion on a continuous latent space under multi-modal conditions, at a cost that is similar to that of image diffusion models. Results show that LaMD generates high-quality videos on various benchmark datasets, including BAIR, Landscape, NATOPS, MUG and CATER-GEN, that encompass a variety of stochastic dynamics and highly controllable movements on multiple image-conditional video generation tasks, while significantly decreases sampling time.

引用

页数：17

共 50 条

[41] VIDEO COMPRESSION USING CONDITIONAL REPLENISHMENT AND MOTION PREDICTION
HEIN, DN
AHMED, N
IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY, 1984, 26 (03) : 134 - 142
[42] Latent Neural Differential Equations for Video Generation
Gordon, Cade
Parde, Natalie
NEURIPS 2020 WORKSHOP ON PRE-REGISTRATION IN MACHINE LEARNING, VOL 148, 2020, 148 : 73 - 86
[43] Lossy Image Compression with Conditional Diffusion Models
Yang, Ruihan
Mandt, Stephan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[44] Sign Motion Generation by Motion Diffusion Model
Hakozaki, Kohei
Murakami, Tomoya
Uchida, Tsubasa
Miyazaki, Taro
Kaneko, Hiroyuki
PROCEEDINGS OF THE SIGGRAPH 2024 POSTERS, 2024,
[45] Conditional Diffusion for SAR to Optical Image Translation
Bai, Xinyu
Pu, Xinyang
Xu, Feng
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[46] Audio Generation with Multiple Conditional Diffusion Model
Guo, Zhifang
Mao, Jianguo
Tao, Rui
Yan, Long
Ouchi, Kazushige
Liu, Hong
Wang, Xiangdong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18153 - 18161
[47] Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Shi, Xiaoyu
Huang, Zhaoyang
Wang, Fu-Yun
Bian, Weikang
Li, Dasong
Zhang, Yi
Zhang, Manyuan
Cheung, Ka Chun
See, Simon
Qin, Hongwei
Dai, Jifeng
Li, Hongsheng
PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
[48] Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
Li, Hang
Shen, Chengzhi
Torre, Philip
Tresp, Volker
Guo, Jindong
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 12006 - 12016
[49] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Wu, Jay Zhangjie
Ge, Yixiao
Wang, Xintao
Lei, Stan Weixian
Gu, Yuchao
Shi, Yufei
Hsu, Wynne
Shan, Ying
Qie, Xiaohu
Shou, Mike Zheng
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7589 - 7599
[50] Motion-Guided Latent Diffusion for Temporally Consistent Real-World Video Super-Resolution
Yang, Xi
He, Chenheng
Ma, Jianqi
Zhang, Lei
COMPUTER VISION-ECCV 2024, PT XLIV, 2025, 15102 : 224 - 242

← 1 2 3 4 5 →