Parallel and High-Fidelity Text-to-Lip Generation

被引:0
|
作者
Liu, Jinglin [1 ]
Zhu, Zhiying [1 ]
Ren, Yi [1 ]
Huang, Wencan [1 ]
Huai, Baoxing [2 ]
Yuan, Nicholas [2 ]
Zhao, Zhou [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[2] Huawei Cloud, Hong Kong, Peoples R China
基金
浙江省自然科学基金; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a key component of talking face generation, lip movements generation determines the naturalness and coherence of the generated talking face video. Prior literature mainly focuses on speech-to-lip generation while there is a paucity in text-to-lip (T2L) generation. T2L is a challenging task and existing end-to-end works depend on the attention mechanism and autoregressive (AR) decoding manner. However, the AR decoding manner generates current lip frame conditioned on frames generated previously, which inherently hinders the inference speed, and also has a detrimental effect on the quality of generated lip frames due to error propagation. This encourages the research of parallel T2L generation. In this work, we propose a parallel decoding model for fast and high-fidelity text-to-lip generation (ParaLip). Specifically, we predict the duration of the encoded linguistic features and model the target lip frames conditioned on the encoded linguistic features with their duration in a non-autoregressive manner. Furthermore, we incorporate the structural similarity index loss and adversarial learning to improve perceptual quality of generated lip frames and alleviate the blurry prediction problem. Extensive experiments conducted on GRID and TCD-TIMIT datasets demonstrate the superiority of proposed methods.
引用
收藏
页码:1738 / 1746
页数:9
相关论文
共 50 条
  • [1] FlexLip: A Controllable Text-to-Lip System
    Oneata, Dan
    Lorincz, Beata
    Stan, Adriana
    Cucu, Horia
    SENSORS, 2022, 22 (11)
  • [2] Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video
    Wu, Xiuzhe
    Hu, Pengfei
    Wu, Yang
    Lyu, Xiaoyang
    Cao, Yan-Pei
    Shan, Ying
    Yang, Wenming
    Sun, Zhongqian
    Qi, Xiaojuan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22111 - 22120
  • [3] VividWav2Lip: High-Fidelity Facial Animation Generation Based on Speech-Driven Lip Synchronization
    Liu, Li
    Wang, Jinhui
    Chen, Shijuan
    Li, Zongmei
    ELECTRONICS, 2024, 13 (18)
  • [4] SHYI: Action Support for Contrastive Learning in High-Fidelity Text-to-Image Generation
    Xia, Tianxiang
    Xiao, Lin
    Montorfani, Yannick
    Pavia, Francesco
    Simsar, Enis
    Hofmann, Thomas
    arXiv,
  • [5] High-Fidelity Image Generation With Fewer Labels
    Lucic, Mario
    Tschannen, Michael
    Ritter, Marvin
    Zhai, Xiaohua
    Bachem, Olivier
    Gelly, Sylvain
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [6] ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
    Wang, Zhengyi
    Lu, Cheng
    Wang, Yikai
    Bao, Fan
    Li, Chongxuan
    Su, Hang
    Zhu, Jun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Parallel selective rendering of high-fidelity virtual environments
    Debattista, K.
    Chalmers, A.
    Gillibrand, R.
    LonghurSt, P.
    Mastoropouiou, G.
    Sundstedt, V.
    PARALLEL COMPUTING, 2007, 33 (06) : 361 - 376
  • [8] Logic2Text: High-Fidelity Natural Language Generation from Logical Forms
    Chen, Zhiyu
    Chen, Wenhu
    Zha, Hanwen
    Zhou, Xiyou
    Zhang, Yunkai
    Sundaresan, Sairam
    Wang, William Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2096 - 2111
  • [9] Parallel WaveNet: Fast High-Fidelity Speech Synthesis
    van den Oord, Aaron
    Li, Yazhe
    Babuschkin, Igor
    Simonyan, Karen
    Vinyals, Oriol
    Kavukcuoglu, Koray
    van den Driessche, George
    Lockhart, Edward
    Cobo, Luis C.
    Stimberg, Florian
    Casagrande, Norman
    Grewe, Dominik
    Noury, Seb
    Dieleman, Sander
    Elsen, Erich
    Kalchbrenner, Nal
    Zen, Heiga
    Graves, Alex
    King, Helen
    Walters, Tom
    Belov, Dan
    Hassabis, Demis
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [10] High-fidelity image warping for serial and parallel processing
    Fraser, D
    He, HX
    Schowengerdt, RA
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, PROCEEDINGS - VOL III, 1996, : 719 - 722