Audio-Driven Lips and Expression on 3D Human Face

被引：0

作者：

Ma, Le ^{[1
,2
]}

Ma, Zhihao ^{[1
,2
]}

Meng, Weiliang ^{[1
,2
]}

Xu, Shibiao ^{[3
]}

Zhang, Xiaopeng ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[3] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT II | 2024年 / 14496卷

基金：

中国国家自然科学基金;

关键词：

Face Expression; Lips Movement; Fusion; DATABASE;

D O I：

10.1007/978-3-031-50072-5_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Extensive research has delved into audio-driven 3D facial animation with numerous attempts to achieve human-like performance. However, creating truly realistic and expressive 3D facial animations remains a challenging task, as existing methods often struggle to capture the subtle nuances of anthropomorphic expressions. We propose the Audio-Driven Lips and Expression (ADLE) method, specifically designed to generate highly expressive and lifelike conversations between individuals, complete with essential social signals like laughter and excitement, solely based on audio cues. The foundation of our approach lies in the revolutionary audio-expression-consistency strategy, which effectively disentangles person-specific lip movements from dependent facial expressions. As a result, our ADLE robustly learns lip movements and generic expression parameters on a 3D human face from an audio sequence, which represents a powerful multimodal fusion approach capable of generating accurate lip movements paired with vivid facial expressions on a 3D face, all in real-time. Experiments validates that our ADLE outperforms other state-of-the-art works in this field, making it a highly promising approach for a wide range of applications.

引用

页码：15 / 26

页数：12

共 50 条

[41] INTEGRATION OF 3D AUDIO AND 3D VIDEO FOR FTV
Tehrani, Mehrdad Panahpour
Yendo, Tomohiro
Fujii, Toshiaki
Takeda, Kazuya
Mase, Kenji
Tanimoto, Masayuki
2009 3DTV-CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO, 2009, : 265 - +
[42] META TALK: LEARNING TO DATA-EFFICIENTLY GENERATE AUDIO-DRIVEN LIP-SYNCHRONIZED TALKING FACE WITH HIGH DEFINITION
Zhang, Yuhan
He, Weihua
Li, Minglei
Tian, Kun
Zhang, Ziyang
Cheng, Jie
Wang, Yaoyuan
Liao, Jianxing
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4848 - 4852
[43] A quantitative comparison of 3D face databases for 3D face recognition
Smeets, Dirk
Hermans, Jeroen
Vandermeulen, Dirk
Suetens, Paul
SENSING TECHNOLOGIES FOR GLOBAL HEALTH, MILITARY MEDICINE, DISASTER RESPONSE, AND ENVIRONMENTAL MONITORING AND BIOMETRIC TECHNOLOGY FOR HUMAN IDENTIFICATION VIII, 2011, 8029
[44] 3D face:: Biometric template protection for 3D face recognition
Kelkboom, E. J. C.
Goekberk, B.
Kevenaar, T. A. M.
Akkermans, A. H. M.
van der Veen, M.
ADVANCES IN BIOMETRICS, PROCEEDINGS, 2007, 4642 : 566 - +
[45] Speech-driven face synthesis from 3D video
Ypsilos, LA
Hilton, A
Turkmani, A
Jackson, PJB
2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 58 - 65
[46] Noise versus facial expression on 3D face recognition
Queirolo, Chaua
Segundo, Mauricio P.
Bellon, Olga
Silva, Luciano
14TH INTERNATIONAL CONFERENCE ON IMAGE ANALYSIS AND PROCESSING, PROCEEDINGS, 2007, : 171 - +
[47] Expression Invariant 3D Face Recognition with a Morphable Model
Amberg, Brian
Knothe, Reinhard
Vetter, Thomas
2008 8TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2008), VOLS 1 AND 2, 2008, : 667 - 672
[48] 3D Assisted Face Recognition: Dealing With Expression Variations
Erdogmus, Nesli
Dugelay, Jean-Luc
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2014, 9 (05) : 826 - 838
[49] 3D FACE RECOGNITION UNDER ISOMETRIC EXPRESSION DEFORMATIONS
Sovizi, Javad
Rai, Rahul
Krovi, Venkat
PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2014, VOL 1A, 2014,
[50] Realistic and Animatable Face Models for Expression Simulations in 3D
Erdogmus, Nesli
Etheve, Remy
Dugelay, Jean-Luc
THREE-DIMENSIONAL IMAGE PROCESSING (3DIP) AND APPLICATIONS, 2010, 7526

← 1 2 3 4 5 →