Spatially and Temporally Optimized Audio-Driven Talking Face Generation

被引：0

作者：

Dong, Biao ^{[1
]}

Ma, Bo-Yao ^{[1
]}

Zhang, Lei ^{[1
]}

机构：

[1] Beijing Inst Technol, Beijing, Peoples R China

来源：

COMPUTER GRAPHICS FORUM | 2024年 / 43卷 / 07期

基金：

国家重点研发计划;

关键词：

NETWORK;

D O I：

10.1111/cgf.15228

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Audio-driven talking face generation is essentially a cross-modal mapping from audio to video frames. The main challenge lies in the intricate one-to-many mapping, which affects lip sync accuracy. And the loss of facial details during image reconstruction often results in visual artifacts in the generated video. To overcome these challenges, this paper proposes to enhance the quality of generated talking faces with a new spatio-temporal consistency. Specifically, the temporal consistency is achieved through consecutive frames of the each phoneme, which form temporal modules that exhibit similar lip appearance changes. This allows for adaptive adjustment in the lip movement for accurate sync. The spatial consistency pertains to the uniform distribution of textures within local regions, which form spatial modules and regulate the texture distribution in the generator. This yields fine details in the reconstructed facial images. Extensive experiments show that our method can generate more natural talking faces than previous state-of-the-art methods in both accurate lip sync and realistic facial details.

引用

页数：11

共 50 条

[31] Audio-Driven Stylized Gesture Generation with Flow-Based Model
Ye, Sheng
Wen, Yu-Hui
Sun, Yanan
He, Ying
Zhang, Ziyang
Wang, Yaoyuan
He, Weihua
Liu, Yong-Jin
COMPUTER VISION - ECCV 2022, PT V, 2022, 13665 : 712 - 728
[32] Let's Play Music: Audio-driven Performance Video Generation
Zhu, Hao
Li, Yi
Zhu, Feixia
Zheng, Aihua
He, Ran
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3574 - 3581
[33] Photorealistic Audio-driven Video Portraits
Wen, Xin
Wang, Miao
Richardt, Christian
Chen, Ze-Yin
Hu, Shi-Min
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2020, 26 (12) : 3457 - 3466
[34] Audio-Driven Emotional Video Portraits
Ji, Xinya
Zhou, Hang
Wang, Kaisiyuan
Wu, Wayne
Loy, Chen Change
Cao, Xun
Xu, Feng
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14075 - 14084
[35] Audio-Driven Laughter Behavior Controller
Ding, Yu
Huang, Jing
Pelachaud, Catherine
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2017, 8 (04) : 546 - 558
[36] Voice2Face: Audio-driven Facial and Tongue Rig Animations with cVAEs
Aylagas, Monica Villanueva
Leon, Hector Anadon
Teye, Mattias
Tollmar, Konrad
COMPUTER GRAPHICS FORUM, 2022, 41 (08) : 255 - 265
[37] Talking Face Generation With Audio-Deduced Emotional Landmarks
Zhai, Shuyan
Liu, Meng
Li, Yongqiang
Gao, Zan
Zhu, Lei
Nie, Liqiang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14099 - 14111
[38] Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
Zhu, Lingting
Liu, Xian
Liu, Xuanyu
Qian, Rui
Liu, Ziwei
Yu, Lequan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10544 - 10553
[39] Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation
Liu, Xian
Xu, Yinghao
Wu, Qianyi
Zhou, Hang
Wu, Wayne
Zhou, Bolei
COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 106 - 125
[40] Multimodal Learning for Temporally Coherent Talking Face Generation With Articulator Synergy
Yu, Lingyun
Xie, Hongtao
Zhang, Yongdong
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2950 - 2962

← 1 2 3 4 5 →