DaGAN plus plus : Depth-Aware Generative Adversarial Network for Talking Head Video Generation

被引:2
|
作者
Hong, Fa-Ting [1 ]
Shen, Li [2 ]
Xu, Dan [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Alibaba Grp, Hangzhou 310052, Peoples R China
关键词
Faces; Head; Three-dimensional displays; Geometry; Magnetic heads; Estimation; Annotations; Talking head generation; self-supervised facial depth estimation; geometry-guided video generation; IMAGE;
D O I
10.1109/TPAMI.2023.3339964
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predominant techniques on talking head generation largely depend on 2D information, including facial appearances and motions from input face images. Nevertheless, dense 3D facial geometry, such as pixel-wise depth, plays a critical role in constructing accurate 3D facial structures and suppressing complex background noises for generation. However, dense 3D annotations for facial videos is prohibitively costly to obtain. In this paper, first, we present a novel self-supervised method for learning dense 3D facial geometry (i.e., depth) from face videos, without requiring camera parameters and 3D geometry annotations in training. We further propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning. Second, we design an effective geometry-guided facial keypoint estimation module, providing accurate keypoints for generating motion fields. Lastly, we develop a 3D-aware cross-modal (i.e., appearance and depth) attention mechanism, which can be applied to each generation layer, to capture facial geometries in a coarse-to-fine manner. Extensive experiments are conducted on three challenging benchmarks (i.e., VoxCeleb1, VoxCeleb2, and HDTF). The results demonstrate that our proposed framework can generate highly realistic-looking reenacted talking videos, with new state-of-the-art performances established on these benchmarks.
引用
收藏
页码:2997 / 3012
页数:16
相关论文
共 44 条
  • [31] CT-Based Pelvic T1-Weighted MR Image Synthesis Using UNet, UNet plus plus and Cycle-Consistent Generative Adversarial Network (Cycle-GAN)
    Kalantar, Reza
    Messiou, Christina
    Winfield, Jessica M.
    Renn, Alexandra
    Latifoltojar, Arash
    Downey, Kate
    Sohaib, Aslam
    Lalondrelle, Susan
    Koh, Dow-Mu
    Blackledge, Matthew D.
    FRONTIERS IN ONCOLOGY, 2021, 11
  • [32] MR-Based Synthetic-CT Generation Using Generative Adversarial Network for Head and Neck MR-Only Radiotherapy
    Qi, M.
    Li, Y.
    Wu, A.
    Guo, F.
    Jia, Q.
    Zhou, L.
    Song, T.
    MEDICAL PHYSICS, 2019, 46 (06) : E112 - E112
  • [33] Real-time generation of multi-view video plus depth content using mixed narrow and wide baseline
    Zilly, Frederik
    Riechert, Christian
    Mueller, Marcus
    Eisert, Peter
    Sikora, Thomas
    Kauff, Peter
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2014, 25 (04) : 632 - 648
  • [34] User-perceived quality aware adaptive streaming of 3D multi-view video plus depth over the internet
    Karn, Nabin Kumar
    Zhang, Hongli
    Jiang, Feng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (17) : 22965 - 22983
  • [35] User-perceived quality aware adaptive streaming of 3D multi-view video plus depth over the internet
    Nabin Kumar Karn
    Hongli Zhang
    Feng Jiang
    Multimedia Tools and Applications, 2018, 77 : 22965 - 22983
  • [36] Important macroblock distinction model for multi-view plus depth video transmission over error-prone network
    Hanzhang Wang
    Xiaodong Wang
    Multimedia Tools and Applications, 2017, 76 : 26745 - 26767
  • [37] Important macroblock distinction model for multi-view plus depth video transmission over error-prone network
    Wang, Hanzhang
    Wang, Xiaodong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (24) : 26745 - 26767
  • [38] Attentive Dual Residual Generative Adversarial Network for Energy-Aware Routing Through Golden Search Optimization Algorithm in Wireless Sensor Network Utilizing Cluster Head Selection
    Ravikumar, K.
    Mathivanan, M.
    Muruganandham, A.
    Raja, L.
    TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2025, 36 (01):
  • [39] An Interactive Evolution Strategy based Deep Convolutional Generative Adversarial Network for 2D Video Game Level Procedural Content Generation
    Jiang, Ming
    Zhang, Li
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [40] PA-DCGAN: Efficient Spectrum Generation using Physics-Aware Deep Convolutional Generative Adversarial Network with Latent Physical Characteristics and Constraints
    Xie, Xiang
    Gao, Yuhao
    Stork, Wilhelm
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1164 - 1171