DaGAN plus plus : Depth-Aware Generative Adversarial Network for Talking Head Video Generation

被引:2
|
作者
Hong, Fa-Ting [1 ]
Shen, Li [2 ]
Xu, Dan [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Alibaba Grp, Hangzhou 310052, Peoples R China
关键词
Faces; Head; Three-dimensional displays; Geometry; Magnetic heads; Estimation; Annotations; Talking head generation; self-supervised facial depth estimation; geometry-guided video generation; IMAGE;
D O I
10.1109/TPAMI.2023.3339964
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predominant techniques on talking head generation largely depend on 2D information, including facial appearances and motions from input face images. Nevertheless, dense 3D facial geometry, such as pixel-wise depth, plays a critical role in constructing accurate 3D facial structures and suppressing complex background noises for generation. However, dense 3D annotations for facial videos is prohibitively costly to obtain. In this paper, first, we present a novel self-supervised method for learning dense 3D facial geometry (i.e., depth) from face videos, without requiring camera parameters and 3D geometry annotations in training. We further propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning. Second, we design an effective geometry-guided facial keypoint estimation module, providing accurate keypoints for generating motion fields. Lastly, we develop a 3D-aware cross-modal (i.e., appearance and depth) attention mechanism, which can be applied to each generation layer, to capture facial geometries in a coarse-to-fine manner. Extensive experiments are conducted on three challenging benchmarks (i.e., VoxCeleb1, VoxCeleb2, and HDTF). The results demonstrate that our proposed framework can generate highly realistic-looking reenacted talking videos, with new state-of-the-art performances established on these benchmarks.
引用
收藏
页码:2997 / 3012
页数:16
相关论文
共 44 条
  • [1] Depth-Aware Generative Adversarial Network for Talking Head Video Generation
    Hong, Fa-Ting
    Zhang, Longhao
    Shen, Li
    Xu, Dan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3387 - 3396
  • [2] DEPTH-AWARE 3D VIDEO FILTERING TARGETTING MULTIVIEW VIDEO PLUS DEPTH COMPRESSION
    Aflaki, Payman
    Hannuksela, Miska M.
    Homayouni, Maryam
    Gabbouj, Moncef
    2014 3DTV-CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO (3DTV-CON), 2014,
  • [3] Talking Face Generation with Expression-Tailored Generative Adversarial Network
    Zeng, Dan
    Liu, Han
    Lin, Hui
    Ge, Shiming
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1716 - 1724
  • [4] Recognition-Aware HRRP Generation With Generative Adversarial Network
    Huang, Yue
    Wen, Yi
    Shi, Liangchao
    Ding, Xinghao
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [5] EMSIM plus : Accelerating Electromagnetic Security Evaluation with Generative Adversarial Network
    Gao, Ya
    Ma, Haocheng
    Kong, Jindi
    He, Jiaji
    Zhao, Yiqiang
    Jin, Yier
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
  • [6] Dast-Net: Depth-Aware Spatio-Temporal Network for Video Deblurring
    Zhu, Qi
    Xiao, Zeyu
    Huang, Jie
    Zhao, Feng
    Proceedings - IEEE International Conference on Multimedia and Expo, 2022, 2022-July
  • [7] Scripted Video Generation With a Bottom-Up Generative Adversarial Network
    Chen, Qi
    Wu, Qi
    Chen, Jian
    Wu, Qingyao
    van den Hengel, Anton
    Tan, Mingkui
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 7454 - 7467
  • [8] SURFGenerator: Generative Adversarial Network Modeling for Synthetic Flooding Video Generation
    Lamczyk, Stephen
    Ampofo, Kwame
    Salashour, Behrouz
    Cetin, Mecit
    Iftekharuddin, Khan M.
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [9] A Generative Adversarial Network for Fusion of Infrared and Visible Images Based on UNet plus
    Zhao, Kangcheng
    Cheng, Jianghua
    Liu, Tong
    Deng, Huafu
    2020 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING AND ARTIFICIAL INTELLIGENCE, 2020, 11584
  • [10] CorGAN: Context aware Recurrent Generative Adversarial Network for Medical Image Generation
    Qiao, Zhi
    Qian, Zhen
    Tang, Hui
    Gong, Guanzhong
    Yin, Yong
    Huang, Chao
    Fan, Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1100 - 1103