Parallel and High-Fidelity Text-to-Lip Generation

被引:0
|
作者
Liu, Jinglin [1 ]
Zhu, Zhiying [1 ]
Ren, Yi [1 ]
Huang, Wencan [1 ]
Huai, Baoxing [2 ]
Yuan, Nicholas [2 ]
Zhao, Zhou [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[2] Huawei Cloud, Hong Kong, Peoples R China
基金
浙江省自然科学基金; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a key component of talking face generation, lip movements generation determines the naturalness and coherence of the generated talking face video. Prior literature mainly focuses on speech-to-lip generation while there is a paucity in text-to-lip (T2L) generation. T2L is a challenging task and existing end-to-end works depend on the attention mechanism and autoregressive (AR) decoding manner. However, the AR decoding manner generates current lip frame conditioned on frames generated previously, which inherently hinders the inference speed, and also has a detrimental effect on the quality of generated lip frames due to error propagation. This encourages the research of parallel T2L generation. In this work, we propose a parallel decoding model for fast and high-fidelity text-to-lip generation (ParaLip). Specifically, we predict the duration of the encoded linguistic features and model the target lip frames conditioned on the encoded linguistic features with their duration in a non-autoregressive manner. Furthermore, we incorporate the structural similarity index loss and adversarial learning to improve perceptual quality of generated lip frames and alleviate the blurry prediction problem. Extensive experiments conducted on GRID and TCD-TIMIT datasets demonstrate the superiority of proposed methods.
引用
收藏
页码:1738 / 1746
页数:9
相关论文
共 50 条
  • [41] High-Fidelity Audio Generation and Representation Learning With Guided Adversarial Autoencoder
    Haque, Kazi Nazmul
    Rana, Rajib
    Schuller, Bjorn W.
    IEEE ACCESS, 2020, 8 : 223509 - 223528
  • [42] GENERATION OF HIGH-FIDELITY PROGRAMMABLE ULTRAFAST OPTICAL WAVE-FORMS
    WEFERS, MM
    NELSON, KA
    OPTICS LETTERS, 1995, 20 (09) : 1047 - 1049
  • [43] Mutual Information Compensation for High-Fidelity Image Generation With Limited Data
    Zhai, YiKui
    Long, ZhiHao
    Pan, WenFeng
    Chen, C. L. Philip
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2145 - 2149
  • [44] Robust scheme for high-fidelity generation of mesoscopic entangled cat state
    Zhou, Yuan
    Li, Ying
    Wang, Jing-Wei
    Wang, Xing-Chen
    Xie, Peng
    Lue, Dong
    Li, Xin-Ke
    Ren, Hong-Tao
    PHYSICA SCRIPTA, 2023, 98 (06)
  • [45] High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder
    Haque, Kazi Nazmul
    Rana, Rajib
    Schuller, Bjorn W.
    Haque, Kazi Nazmul (shezan.huq@gmail.com), 1600, Institute of Electrical and Electronics Engineers Inc. (08): : 223509 - 223528
  • [46] High-fidelity biexciton generation in quantum dots by chirped laser pulses
    Debnath, A.
    Meier, C.
    Chatel, B.
    Amand, T.
    PHYSICAL REVIEW B, 2013, 88 (20)
  • [47] Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
    Ge, Yunhao
    Zeng, Xiaohui
    Huffman, Jacob Samuel
    Lin, Tsung-Yi
    Liu, Ming-Yu
    Cui, Yin
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14033 - 14042
  • [48] SingGAN: Generative Adversarial NetWork For High-Fidelity Singing Voice Generation
    Huang, Rongjie
    Cui, Chenye
    Chen, Feiyang
    Ren, Yi
    Liu, Jinglin
    Zhao, Zhou
    Huai, Baoxing
    Wang, Zhefeng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2525 - 2535
  • [49] High-Fidelity Cluster State Generation for Ultracold Atoms in an Optical Lattice
    Inaba, Kensuke
    Tokunaga, Yuuki
    Tamaki, Kiyoshi
    Igeta, Kazuhiro
    Yamashita, Makoto
    PHYSICAL REVIEW LETTERS, 2014, 112 (11)
  • [50] Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
    Ge, Yunhao
    Zeng, Xiaohui
    Huffman, Jacob Samuel
    Lin, Tsung-Yi
    Liu, Ming-Yu
    Cui, Yin
    arXiv,