Spatially and Temporally Optimized Audio-Driven Talking Face Generation

被引:0
|
作者
Dong, Biao [1 ]
Ma, Bo-Yao [1 ]
Zhang, Lei [1 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
NETWORK;
D O I
10.1111/cgf.15228
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Audio-driven talking face generation is essentially a cross-modal mapping from audio to video frames. The main challenge lies in the intricate one-to-many mapping, which affects lip sync accuracy. And the loss of facial details during image reconstruction often results in visual artifacts in the generated video. To overcome these challenges, this paper proposes to enhance the quality of generated talking faces with a new spatio-temporal consistency. Specifically, the temporal consistency is achieved through consecutive frames of the each phoneme, which form temporal modules that exhibit similar lip appearance changes. This allows for adaptive adjustment in the lip movement for accurate sync. The spatial consistency pertains to the uniform distribution of textures within local regions, which form spatial modules and regulate the texture distribution in the generator. This yields fine details in the reconstructed facial images. Extensive experiments show that our method can generate more natural talking faces than previous state-of-the-art methods in both accurate lip sync and realistic facial details.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Emotion-Aware Audio-Driven Face Animation via Contrastive Feature Disentanglement
    Ren, Xin
    Luo, Juan
    Zhong, Xionghu
    Cai, Minjie
    INTERSPEECH 2023, 2023, : 2728 - 2732
  • [42] Expressive talking face generation via audio visual control
    Pengfei Li
    Huihuang Zhao
    Mugang Lin
    Qingyun Liu
    Peng Tang
    Yangfan Zhou
    Multimedia Systems, 2025, 31 (3)
  • [43] Mining Audio, Text and Visual Information for Talking Face Generation
    Yu, Lingyun
    Yu, Jun
    Ling, Qiang
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 787 - 795
  • [44] Audio-Driven Multimedia Content Authentication as a Service
    Vryzas, Nikolaos
    Katsaounidou, Anastasia
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    146TH AES CONVENTION, 2019,
  • [45] DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models
    Yang, Sicheng
    Wu, Zhiyong
    Li, Minglei
    Zhang, Zhensong
    Hao, Lei
    Bao, Weihong
    Cheng, Ming
    Xiao, Long
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5860 - 5868
  • [46] SynGauss: Real-Time 3D Gaussian Splatting for Audio-Driven Talking Head Synthesis
    Zhou, Zhanyi
    Feng, Quandong
    Li, Hongjun
    IEEE ACCESS, 2025, 13 : 42167 - 42177
  • [47] Audio-Driven Facial Animation with Deep Learning: A Survey
    Jiang, Diqiong
    Chang, Jian
    You, Lihua
    Bian, Shaojun
    Kosk, Robert
    Maguire, Greg
    INFORMATION, 2024, 15 (11)
  • [48] Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
    Zhou, Hang
    Liu, Yu
    Liu, Ziwei
    Luo, Ping
    Wang, Xiaogang
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9299 - 9306
  • [49] Spatially and Temporally Optimized Video Stabilization
    Wang, Yu-Shuen
    Liu, Feng
    Hsu, Pu-Sheng
    Lee, Tong-Yee
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (08) : 1354 - 1361
  • [50] Touch the Sound: Audio-Driven Tactile Feedback for Audio Mixing Applications
    Merchel, Sebastian
    Altinsoy, M. Ercan
    Stamm, Maik
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2012, 60 (1-2): : 47 - 53