DaGAN plus plus : Depth-Aware Generative Adversarial Network for Talking Head Video Generation

被引：2

作者：

Hong, Fa-Ting ^{[1
]}

Shen, Li ^{[2
]}

Xu, Dan ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Alibaba Grp, Hangzhou 310052, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 05期

关键词：

Faces; Head; Three-dimensional displays; Geometry; Magnetic heads; Estimation; Annotations; Talking head generation; self-supervised facial depth estimation; geometry-guided video generation; IMAGE;

D O I：

10.1109/TPAMI.2023.3339964

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Predominant techniques on talking head generation largely depend on 2D information, including facial appearances and motions from input face images. Nevertheless, dense 3D facial geometry, such as pixel-wise depth, plays a critical role in constructing accurate 3D facial structures and suppressing complex background noises for generation. However, dense 3D annotations for facial videos is prohibitively costly to obtain. In this paper, first, we present a novel self-supervised method for learning dense 3D facial geometry (i.e., depth) from face videos, without requiring camera parameters and 3D geometry annotations in training. We further propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning. Second, we design an effective geometry-guided facial keypoint estimation module, providing accurate keypoints for generating motion fields. Lastly, we develop a 3D-aware cross-modal (i.e., appearance and depth) attention mechanism, which can be applied to each generation layer, to capture facial geometries in a coarse-to-fine manner. Extensive experiments are conducted on three challenging benchmarks (i.e., VoxCeleb1, VoxCeleb2, and HDTF). The results demonstrate that our proposed framework can generate highly realistic-looking reenacted talking videos, with new state-of-the-art performances established on these benchmarks.

引用

页码：2997 / 3012

页数：16

共 44 条

[41] COVID-19 Identification from Low-Quality Computed Tomography Using a Modified Enhanced Super-Resolution Generative Adversarial Network Plus and Siamese Capsule Network
Nneji, Grace Ugochi
Deng, Jianhua
Monday, Happy Nkanta
Hossin, Md Altab
Obiora, Sandra
Nahar, Saifun
Cai, Jingye
HEALTHCARE, 2022, 10 (02)
[42] Multi-sequence MR image-based synthetic CT generation using a generative adversarial network for head and neck MRI-only radiotherapy
Qi, Mengke
Li, Yongbao
Wu, Aiqian
Jia, Qiyuan
Li, Bin
Sun, Wenzhao
Dai, Zhenhui
Lu, Xingyu
Zhou, Linghong
Deng, Xiaowu
Song, Ting
MEDICAL PHYSICS, 2020, 47 (04) : 1880 - 1894
[43] An artificial intelligence-driven agent for real-time head-and-neck IMRT plan generation using conditional generative adversarial network (cGAN)
Li, Xinyi
Wang, Chunhao
Sheng, Yang
Zhang, Jiahan
Wang, Wentao
Yin, Fang-Fang
Wu, Qiuwen
Wu, Q. Jackie
Ge, Yaorong
MEDICAL PHYSICS, 2021, 48 (06) : 2714 - 2723
[44] Synthetic CT Generation From Multi-Sequence MR Images for Head and Neck MRI-Only Radiotherapy via Cycle-Consistent Generative Adversarial Network
Peng, Y.
Wu, S.
Liu, Y.
Chen, M.
Miao, J.
Zhao, C.
Chen, S.
Qi, Z.
Deng, X.
INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2021, 111 (03): : E530 - E530

← 1 2 3 4 5 →