Audio-Driven Lips and Expression on 3D Human Face

被引：0

作者：

Ma, Le ^{[1
,2
]}

Ma, Zhihao ^{[1
,2
]}

Meng, Weiliang ^{[1
,2
]}

Xu, Shibiao ^{[3
]}

Zhang, Xiaopeng ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[3] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT II | 2024年 / 14496卷

基金：

中国国家自然科学基金;

关键词：

Face Expression; Lips Movement; Fusion; DATABASE;

D O I：

10.1007/978-3-031-50072-5_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Extensive research has delved into audio-driven 3D facial animation with numerous attempts to achieve human-like performance. However, creating truly realistic and expressive 3D facial animations remains a challenging task, as existing methods often struggle to capture the subtle nuances of anthropomorphic expressions. We propose the Audio-Driven Lips and Expression (ADLE) method, specifically designed to generate highly expressive and lifelike conversations between individuals, complete with essential social signals like laughter and excitement, solely based on audio cues. The foundation of our approach lies in the revolutionary audio-expression-consistency strategy, which effectively disentangles person-specific lip movements from dependent facial expressions. As a result, our ADLE robustly learns lip movements and generic expression parameters on a 3D human face from an audio sequence, which represents a powerful multimodal fusion approach capable of generating accurate lip movements paired with vivid facial expressions on a 3D face, all in real-time. Experiments validates that our ADLE outperforms other state-of-the-art works in this field, making it a highly promising approach for a wide range of applications.

引用

页码：15 / 26

页数：12

共 50 条

[1] EmoFace: Audio-driven Emotional 3D Face Animation
Liu, Chang
Lin, Qunfen
Zeng, Zijiao
Pan, Ye
2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES, VR 2024, 2024, : 387 - 397
[2] Audio-Driven Talking Face Generation: A Review
Liu, Shiguang
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2023, 71 (7-8): : 408 - 419
[3] Audio-driven Talking Head Generation with Transformer and 3D Morphable Model
Huang, Ricong
Zhong, Weizhi
Li, Guanbin
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7035 - 7039
[4] SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Zhang, Wenxuan
Cun, Xiaodong
Wang, Xuan
Zhang, Yong
Shen, Xi
Guo, Yu
Shan, Ying
Wang, Fei
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8652 - 8661
[5] Audio-driven Talking Face Video Generation with Emotion
Liang, Jiadong
Lu, Feng
2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 863 - 864
[6] UniTalker: Scaling up Audio-Driven 3D Facial Animation Through A Unified Model
Fan, Xiangyu
Li, Jiaqi
Lin, Zhiqian
Xiao, Weiye
Yang, Lei
COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 204 - 221
[7] EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation
Qi, Xingqun
Liu, Chen
Li, Lincheng
Hou, Jie
Xin, Haoran
Yu, Xin
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10420 - 10430
[8] Personalized Audio-Driven 3D Facial Animation via Style-Content Disentanglement
Chai, Yujin
Shao, Tianjia
Weng, Yanlin
Zhou, Kun
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (03) : 1803 - 1820
[9] Parametric Implicit Face Representation for Audio-Driven Facial Reenactment
Huang, Ricong
Lai, Peiwen
Qin, Yipeng
Li, Guanbin
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12759 - 12768
[10] Spatially and Temporally Optimized Audio-Driven Talking Face Generation
Dong, Biao
Ma, Bo-Yao
Zhang, Lei
COMPUTER GRAPHICS FORUM, 2024, 43 (07)

← 1 2 3 4 5 →