Language-Guided Face Animation by Recurrent StyleGAN-Based Generator

被引:6
|
作者
Hang, Tiankai [1 ,2 ,3 ]
Yang, Huan [1 ]
Liu, Bei [1 ]
Fu, Jianlong [1 ]
Geng, Xin [2 ,3 ]
Guo, Baining [1 ,2 ,3 ]
机构
[1] Microsoft Res Asia, Beijing 100080, Peoples R China
[2] Southeast Univ, Sch Comp Sci & Engn, Nanjing 211189, Peoples R China
[3] Southeast Univ, Key Lab Comp Network & Informat Integrat, Minist Educ, Nanjing 211189, Peoples R China
关键词
Cross-modality; face animation; video synthesis;
D O I
10.1109/TMM.2023.3248143
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent works on language-guided image manipu- lation have shown great power of language in providing rich semantics, especially for face images. However, the other natural information, motions, in language is less explored. In this article, we leverage the motion information and study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages. To better utilize both semantics and motions from languages, we propose a simple yet effective framework. Specifically, we propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames. To optimize the proposed framework, three carefully designed loss functions are proposed including a regularization loss to keep the face identity, a path length regularization loss to ensure motion smoothness, and a contrastive loss to enable video synthesis with various language guidance in one single model. Extensive experiments with both qualitative and quantitative evaluations on diverse domains (e.g., human face, anime face, and dog face) demonstrate the superiority of our model in generating high-quality and realistic videos from one still image with the guidance of language.
引用
收藏
页码:9216 / 9227
页数:12
相关论文
共 7 条
  • [1] Toonify3D: StyleGAN-based 3D Stylized Face Generator
    Jang, Wonjong
    Jung, Yucheol
    Kim, Hyomin
    Ju, Gwangjin
    Son, Chaewon
    Son, Jooeun
    Lee, Seungyong
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [2] StyleGAN-based CLIP-guided Image Shape Manipulation
    Qian, Yuchen
    Yamamoto, Kohei
    Yanai, Keiji
    19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 162 - 166
  • [3] Beyond Aligned Target Face: StyleGAN-based Face-Swapping via Inverted Identity Learning<bold> </bold>
    Li, Yuanhang
    Mao, Qi
    Jin, Libiao
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024, 2024,
  • [4] Language-Based Image Manipulation Built on Language-Guided Ranking
    Wu, Fuxiang
    Liu, Liu
    Hao, Fusheng
    He, Fengxiang
    Cheng, Jun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6219 - 6231
  • [5] Language-guided temporal primitive modeling for skeleton-based action recognition
    Pan, Qingzhe
    Xie, Xuemei
    NEUROCOMPUTING, 2025, 613
  • [6] Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter
    Tziafas, Georgios
    Xu, Yucheng
    Goel, Arushi
    Kasaei, Mohammadreza
    Li, Zhibin
    Kasaei, Hamidreza
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [7] Language-guided target segmentation method based on multi-granularity feature fusion
    Tan Q.
    Wang R.
    Wu A.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (02): : 542 - 550