DaGAN plus plus : Depth-Aware Generative Adversarial Network for Talking Head Video Generation

被引：2

作者：

Hong, Fa-Ting ^{[1
]}

Shen, Li ^{[2
]}

Xu, Dan ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Alibaba Grp, Hangzhou 310052, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 05期

关键词：

Faces; Head; Three-dimensional displays; Geometry; Magnetic heads; Estimation; Annotations; Talking head generation; self-supervised facial depth estimation; geometry-guided video generation; IMAGE;

D O I：

10.1109/TPAMI.2023.3339964

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Predominant techniques on talking head generation largely depend on 2D information, including facial appearances and motions from input face images. Nevertheless, dense 3D facial geometry, such as pixel-wise depth, plays a critical role in constructing accurate 3D facial structures and suppressing complex background noises for generation. However, dense 3D annotations for facial videos is prohibitively costly to obtain. In this paper, first, we present a novel self-supervised method for learning dense 3D facial geometry (i.e., depth) from face videos, without requiring camera parameters and 3D geometry annotations in training. We further propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning. Second, we design an effective geometry-guided facial keypoint estimation module, providing accurate keypoints for generating motion fields. Lastly, we develop a 3D-aware cross-modal (i.e., appearance and depth) attention mechanism, which can be applied to each generation layer, to capture facial geometries in a coarse-to-fine manner. Extensive experiments are conducted on three challenging benchmarks (i.e., VoxCeleb1, VoxCeleb2, and HDTF). The results demonstrate that our proposed framework can generate highly realistic-looking reenacted talking videos, with new state-of-the-art performances established on these benchmarks.

引用

页码：2997 / 3012

页数：16

共 44 条

[1] Depth-Aware Generative Adversarial Network for Talking Head Video Generation
Hong, Fa-Ting
Zhang, Longhao
Shen, Li
Xu, Dan
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3387 - 3396
[2] DEPTH-AWARE 3D VIDEO FILTERING TARGETTING MULTIVIEW VIDEO PLUS DEPTH COMPRESSION
Aflaki, Payman
Hannuksela, Miska M.
Homayouni, Maryam
Gabbouj, Moncef
2014 3DTV-CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO (3DTV-CON), 2014,
[3] Talking Face Generation with Expression-Tailored Generative Adversarial Network
Zeng, Dan
Liu, Han
Lin, Hui
Ge, Shiming
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1716 - 1724
[4] Recognition-Aware HRRP Generation With Generative Adversarial Network
Huang, Yue
Wen, Yi
Shi, Liangchao
Ding, Xinghao
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[5] EMSIM plus : Accelerating Electromagnetic Security Evaluation with Generative Adversarial Network
Gao, Ya
Ma, Haocheng
Kong, Jindi
He, Jiaji
Zhao, Yiqiang
Jin, Yier
2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
[6] Dast-Net: Depth-Aware Spatio-Temporal Network for Video Deblurring
Zhu, Qi
Xiao, Zeyu
Huang, Jie
Zhao, Feng
Proceedings - IEEE International Conference on Multimedia and Expo, 2022, 2022-July
[7] Scripted Video Generation With a Bottom-Up Generative Adversarial Network
Chen, Qi
Wu, Qi
Chen, Jian
Wu, Qingyao
van den Hengel, Anton
Tan, Mingkui
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 7454 - 7467
[8] SURFGenerator: Generative Adversarial Network Modeling for Synthetic Flooding Video Generation
Lamczyk, Stephen
Ampofo, Kwame
Salashour, Behrouz
Cetin, Mecit
Iftekharuddin, Khan M.
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[9] A Generative Adversarial Network for Fusion of Infrared and Visible Images Based on UNet plus
Zhao, Kangcheng
Cheng, Jianghua
Liu, Tong
Deng, Huafu
2020 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING AND ARTIFICIAL INTELLIGENCE, 2020, 11584
[10] CorGAN: Context aware Recurrent Generative Adversarial Network for Medical Image Generation
Qiao, Zhi
Qian, Zhen
Tang, Hui
Gong, Guanzhong
Yin, Yong
Huang, Chao
Fan, Wei
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1100 - 1103

← 1 2 3 4 5 →