DaGAN plus plus : Depth-Aware Generative Adversarial Network for Talking Head Video Generation

被引：2

作者：

Hong, Fa-Ting ^{[1
]}

Shen, Li ^{[2
]}

Xu, Dan ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Alibaba Grp, Hangzhou 310052, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 05期

关键词：

Faces; Head; Three-dimensional displays; Geometry; Magnetic heads; Estimation; Annotations; Talking head generation; self-supervised facial depth estimation; geometry-guided video generation; IMAGE;

D O I：

10.1109/TPAMI.2023.3339964

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Predominant techniques on talking head generation largely depend on 2D information, including facial appearances and motions from input face images. Nevertheless, dense 3D facial geometry, such as pixel-wise depth, plays a critical role in constructing accurate 3D facial structures and suppressing complex background noises for generation. However, dense 3D annotations for facial videos is prohibitively costly to obtain. In this paper, first, we present a novel self-supervised method for learning dense 3D facial geometry (i.e., depth) from face videos, without requiring camera parameters and 3D geometry annotations in training. We further propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning. Second, we design an effective geometry-guided facial keypoint estimation module, providing accurate keypoints for generating motion fields. Lastly, we develop a 3D-aware cross-modal (i.e., appearance and depth) attention mechanism, which can be applied to each generation layer, to capture facial geometries in a coarse-to-fine manner. Extensive experiments are conducted on three challenging benchmarks (i.e., VoxCeleb1, VoxCeleb2, and HDTF). The results demonstrate that our proposed framework can generate highly realistic-looking reenacted talking videos, with new state-of-the-art performances established on these benchmarks.

引用

页码：2997 / 3012

页数：16

共 44 条

[21] CCTV Image Sequence Generation and Modeling Method for Video Anomaly Detection Using Generative Adversarial Network
Shin, Wonsup
Cho, Sung-Bae
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I, 2018, 11314 : 457 - 467
[22] Automated Video Generation of Moving Digits from Text Using Deep Deconvolutional Generative Adversarial Network
Ullah, Anwar
Yu, Xinguo
Numan, Muhammad
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 77 (02): : 2359 - 2383
[23] A Context-Aware Image Generation Method for Assisted Design of Movie Posters Using Generative Adversarial Network
Lu, Yuan
Hou, Ruoxu
Zheng, Jingya
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (13)
[24] ODD-VGAN: Optimised Dual Discriminator Video Generative Adversarial Network for Text-to-Video Generation with Heuristic Strategy
Mehmood, Rayeesa
Bashir, Rumaan
Giri, Kaiser J. J.
JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2023,
[25] Self-Attention Generative Adversarial Network for Synthetic CT Generation from CBCT in Head and Neck Radiotherapy
Wu, S.
Liu, S.
Qian, D.
Wang, R.
Chen, H.
Lu, Y.
Sun, Y.
MEDICAL PHYSICS, 2021, 48 (06)
[26] Occlusion size aware multi-viewpoint images generation from 2D plus depth images
Luo, An-Chun
Chen, Wen-Chao
Shau, De-Jin
Lin, Chung-Wei
STEREOSCOPIC DISPLAYS AND APPLICATIONS XXI, 2010, 7524
[27] Self-attention based generative adversarial network with Aquila optimization algorithm espoused energy aware cluster head selection in WSN
Soundararajan, S.
Bapu, B. R. Tapas
Sargunavathi, S.
Poonguzhali, I.
INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2024, 37 (05)
[28] Reference Viewpoints Selection for Multi-View Video Plus Depth Coding Based on the Network Bandwidth Constraint
Luo, Lei
Jiang, Rongxin
Tian, Xiang
Chen, Yaowu
SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 2134 - 2138
[29] New network bandwidth-limited multi-view video plus depth coding method for 3D video
Yu, M. (jianggangyi@126.com), 1600, Academy Publisher (08)
[30] Generative adversarial neural networks augment marker and pathway analysis of treatment resistant HPV plus head and neck squamous cell carcinoma.
Waters, Michael
Inkman, Matthew
Andruska, Neal
Brenneman, Randall
Markovina, Stephanie Subasic
Schwarz, Julie Kristina
Zhang, Jin
JOURNAL OF CLINICAL ONCOLOGY, 2022, 40 (16) : E18059 - E18059

← 1 2 3 4 5 →