DaGAN plus plus : Depth-Aware Generative Adversarial Network for Talking Head Video Generation

被引：2

作者：

Hong, Fa-Ting ^{[1
]}

Shen, Li ^{[2
]}

Xu, Dan ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Alibaba Grp, Hangzhou 310052, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 05期

关键词：

Faces; Head; Three-dimensional displays; Geometry; Magnetic heads; Estimation; Annotations; Talking head generation; self-supervised facial depth estimation; geometry-guided video generation; IMAGE;

D O I：

10.1109/TPAMI.2023.3339964

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Predominant techniques on talking head generation largely depend on 2D information, including facial appearances and motions from input face images. Nevertheless, dense 3D facial geometry, such as pixel-wise depth, plays a critical role in constructing accurate 3D facial structures and suppressing complex background noises for generation. However, dense 3D annotations for facial videos is prohibitively costly to obtain. In this paper, first, we present a novel self-supervised method for learning dense 3D facial geometry (i.e., depth) from face videos, without requiring camera parameters and 3D geometry annotations in training. We further propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning. Second, we design an effective geometry-guided facial keypoint estimation module, providing accurate keypoints for generating motion fields. Lastly, we develop a 3D-aware cross-modal (i.e., appearance and depth) attention mechanism, which can be applied to each generation layer, to capture facial geometries in a coarse-to-fine manner. Extensive experiments are conducted on three challenging benchmarks (i.e., VoxCeleb1, VoxCeleb2, and HDTF). The results demonstrate that our proposed framework can generate highly realistic-looking reenacted talking videos, with new state-of-the-art performances established on these benchmarks.

引用

页码：2997 / 3012

页数：16

共 44 条

[31] CT-Based Pelvic T1-Weighted MR Image Synthesis Using UNet, UNet plus plus and Cycle-Consistent Generative Adversarial Network (Cycle-GAN)
Kalantar, Reza
Messiou, Christina
Winfield, Jessica M.
Renn, Alexandra
Latifoltojar, Arash
Downey, Kate
Sohaib, Aslam
Lalondrelle, Susan
Koh, Dow-Mu
Blackledge, Matthew D.
FRONTIERS IN ONCOLOGY, 2021, 11
[32] MR-Based Synthetic-CT Generation Using Generative Adversarial Network for Head and Neck MR-Only Radiotherapy
Qi, M.
Li, Y.
Wu, A.
Guo, F.
Jia, Q.
Zhou, L.
Song, T.
MEDICAL PHYSICS, 2019, 46 (06) : E112 - E112
[33] Real-time generation of multi-view video plus depth content using mixed narrow and wide baseline
Zilly, Frederik
Riechert, Christian
Mueller, Marcus
Eisert, Peter
Sikora, Thomas
Kauff, Peter
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2014, 25 (04) : 632 - 648
[34] User-perceived quality aware adaptive streaming of 3D multi-view video plus depth over the internet
Karn, Nabin Kumar
Zhang, Hongli
Jiang, Feng
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (17) : 22965 - 22983
[35] User-perceived quality aware adaptive streaming of 3D multi-view video plus depth over the internet
Nabin Kumar Karn
Hongli Zhang
Feng Jiang
Multimedia Tools and Applications, 2018, 77 : 22965 - 22983
[36] Important macroblock distinction model for multi-view plus depth video transmission over error-prone network
Hanzhang Wang
Xiaodong Wang
Multimedia Tools and Applications, 2017, 76 : 26745 - 26767
[37] Important macroblock distinction model for multi-view plus depth video transmission over error-prone network
Wang, Hanzhang
Wang, Xiaodong
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (24) : 26745 - 26767
[38] Attentive Dual Residual Generative Adversarial Network for Energy-Aware Routing Through Golden Search Optimization Algorithm in Wireless Sensor Network Utilizing Cluster Head Selection
Ravikumar, K.
Mathivanan, M.
Muruganandham, A.
Raja, L.
TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2025, 36 (01):
[39] An Interactive Evolution Strategy based Deep Convolutional Generative Adversarial Network for 2D Video Game Level Procedural Content Generation
Jiang, Ming
Zhang, Li
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[40] PA-DCGAN: Efficient Spectrum Generation using Physics-Aware Deep Convolutional Generative Adversarial Network with Latent Physical Characteristics and Constraints
Xie, Xiang
Gao, Yuhao
Stork, Wilhelm
2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1164 - 1171

← 1 2 3 4 5 →