Parallel and High-Fidelity Text-to-Lip Generation

被引:0
|
作者
Liu, Jinglin [1 ]
Zhu, Zhiying [1 ]
Ren, Yi [1 ]
Huang, Wencan [1 ]
Huai, Baoxing [2 ]
Yuan, Nicholas [2 ]
Zhao, Zhou [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[2] Huawei Cloud, Hong Kong, Peoples R China
基金
浙江省自然科学基金; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a key component of talking face generation, lip movements generation determines the naturalness and coherence of the generated talking face video. Prior literature mainly focuses on speech-to-lip generation while there is a paucity in text-to-lip (T2L) generation. T2L is a challenging task and existing end-to-end works depend on the attention mechanism and autoregressive (AR) decoding manner. However, the AR decoding manner generates current lip frame conditioned on frames generated previously, which inherently hinders the inference speed, and also has a detrimental effect on the quality of generated lip frames due to error propagation. This encourages the research of parallel T2L generation. In this work, we propose a parallel decoding model for fast and high-fidelity text-to-lip generation (ParaLip). Specifically, we predict the duration of the encoded linguistic features and model the target lip frames conditioned on the encoded linguistic features with their duration in a non-autoregressive manner. Furthermore, we incorporate the structural similarity index loss and adversarial learning to improve perceptual quality of generated lip frames and alleviate the blurry prediction problem. Extensive experiments conducted on GRID and TCD-TIMIT datasets demonstrate the superiority of proposed methods.
引用
收藏
页码:1738 / 1746
页数:9
相关论文
共 50 条
  • [31] Robust generation of high-fidelity entangled states for multiple atoms
    林丽华
    Chinese Physics B, 2009, (02) : 588 - 592
  • [32] Robust generation of high-fidelity entangled states for multiple atoms
    Lin Li-Hua
    CHINESE PHYSICS B, 2009, 18 (02) : 588 - 592
  • [33] Demonstration of high-fidelity dynamic optical arbitrary waveform generation
    Fontaine, Nicolas K.
    Geisler, David J.
    Scott, Ryan P.
    He, Tingting
    Heritage, Jonathan P.
    Yoo, S. J. B.
    OPTICS EXPRESS, 2010, 18 (22): : 22988 - 22995
  • [34] High-Fidelity and Freely Controllable Talking Head Video Generation
    Gao, Yue
    Zhou, Yuan
    Wang, Jinglu
    Li, Xiao
    Ming, Xiang
    Lu, Yan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5609 - 5619
  • [35] MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
    Zheng, Chuanxia
    Vuong, Long Tung
    Cai, Jianfei
    Phung, Dinh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [36] Generation of high-fidelity quantum control methods for multilevel systems
    Randall, J.
    Lawrence, A. M.
    Webster, S. C.
    Weidt, S.
    Vitanov, N. V.
    Hensinger, W. K.
    PHYSICAL REVIEW A, 2018, 98 (04)
  • [37] StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator
    Guan, Jiazhi
    Zhang, Zhanwang
    Zhou, Hang
    Hu, Tianshu
    Wang, Kaisiyuan
    He, Dongliang
    Feng, Haocheng
    Liu, Jingtuo
    Ding, Errui
    Liu, Ziwei
    Wang, Jingdong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1505 - 1515
  • [38] Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images
    Yu, Cuican
    Lu, Guansong
    Zeng, Yihan
    Sun, Jian
    Liang, Xiaodan
    Li, Huibin
    Xu, Zongben
    Xu, Songcen
    Zhang, Wei
    Xu, Hang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15280 - 15291
  • [39] HIGH-FIDELITY 3D MODEL GENERATION WITH RELIGHTABLE APPEARANCE FROM SINGLE FREEHAND SKETCHES AND TEXT GUIDANCE
    Chen, Tianrun
    Cao, Runlong
    Lu, Ankang
    Xu, Tao
    Zhang, Xiaoling
    Papa, Mao
    Zhang, Ming
    Sun, Lingyun
    Zhang, Ying
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024, 2024,
  • [40] Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
    Kharitonov, Eugene
    Vincent, Damien
    Borsos, Zalan
    Marinier, Raphael
    Girgin, Sertan
    Pietquin, Olivier
    Sharifi, Matt
    Tagliasacchi, Marco
    Zeghidour, Neil
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1703 - 1718