Spatially and Temporally Optimized Audio-Driven Talking Face Generation

被引:0
|
作者
Dong, Biao [1 ]
Ma, Bo-Yao [1 ]
Zhang, Lei [1 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
NETWORK;
D O I
10.1111/cgf.15228
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Audio-driven talking face generation is essentially a cross-modal mapping from audio to video frames. The main challenge lies in the intricate one-to-many mapping, which affects lip sync accuracy. And the loss of facial details during image reconstruction often results in visual artifacts in the generated video. To overcome these challenges, this paper proposes to enhance the quality of generated talking faces with a new spatio-temporal consistency. Specifically, the temporal consistency is achieved through consecutive frames of the each phoneme, which form temporal modules that exhibit similar lip appearance changes. This allows for adaptive adjustment in the lip movement for accurate sync. The spatial consistency pertains to the uniform distribution of textures within local regions, which form spatial modules and regulate the texture distribution in the generator. This yields fine details in the reconstructed facial images. Extensive experiments show that our method can generate more natural talking faces than previous state-of-the-art methods in both accurate lip sync and realistic facial details.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Audio-Driven Talking Face Generation: A Review
    Liu, Shiguang
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2023, 71 (7-8): : 408 - 419
  • [2] Audio-driven Talking Face Video Generation with Emotion
    Liang, Jiadong
    Lu, Feng
    2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 863 - 864
  • [3] Audio-Driven Talking Face Video Generation With Dynamic Convolution Kernels
    Ye, Zipeng
    Xia, Mengfei
    Yi, Ran
    Zhang, Juyong
    Lai, Yu-Kun
    Huang, Xuwei
    Zhang, Guoxin
    Liu, Yong-Jin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2033 - 2046
  • [4] Audio-driven talking face generation with diverse yet realistic facial animations
    Wu, Rongliang
    Yu, Yingchen
    Zhan, Fangneng
    Zhang, Jiahui
    Zhang, Xiaoqin
    Lu, Shijian
    PATTERN RECOGNITION, 2023, 144
  • [5] Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis
    Wu, Haozhe
    Jia, Jia
    Wang, Haoyu
    Dou, Yishun
    Duan, Chao
    Deng, Qingshan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1478 - 1486
  • [6] EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation
    Tan, Shuai
    Ji, Bin
    Pan, Ye
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22089 - 22099
  • [7] EAT-Face: Emotion-Controllable Audio-Driven Talking Face Generation via Diffusion Model
    Wang, Haodi
    Jia, Xiaojun
    Cao, Xiaochun
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [8] Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
    Gan, Yuan
    Yang, Zongxin
    Yue, Xihang
    Sun, Lingyun
    Yang, Yi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22577 - 22588
  • [9] Audio-Driven Talking Video Frame Restoration
    Cheng, Harry
    Guo, Yangyang
    Yin, Jianhua
    Chen, Haonan
    Wang, Jiafang
    Nie, Liqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4110 - 4122
  • [10] Audio-driven Talking Head Generation with Transformer and 3D Morphable Model
    Huang, Ricong
    Zhong, Weizhi
    Li, Guanbin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7035 - 7039