Text Guided Facial Image Synthesis Using StyleGAN and Variational Autoencoder Trained CLIP

被引:0
|
作者
Srinivasa, Anagha [1 ]
Praveen, Anjali [1 ]
Mavathur, Anusha [1 ]
Pothumarthi, Apurva [1 ]
Arya, Arti [1 ]
Agarwal, Pooja [1 ]
机构
[1] PES Univ, Bangalore 560100, Karnataka, India
关键词
Facial synthesis; Image manipulation; Vector Quantized Variational Autoencoders (VQVAE); Contrastive Language Image; Pre-training (CLIP); StyleGAN2;
D O I
10.1007/978-3-031-42508-0_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The average user may have little to no artistic skills but can describe what they envision in words. The user-provided text can be instantly transformed into a realistic image with the aid of generative neural architectures. This study intends to propose a novel approach to generate a facial image based on a user-given textual description. Prior works focus less on the manipulation aspects, hence the approach also emphasizes on manipulating and modifying the image generated, based on additional textual descriptions as required to further refine the expected face. It consists of a multi-level Vector-Quantized Variational Auto Encoder (VQVAE) that provides the image encodings, the Contrastive Language-Image Pre-Training (CLIP) module to interpret the texts and compute how close the final image encodings and the text are with each other within a common space, and a StyleGAN2 to decode and generate the required image output. The combination of such components within the architecture is unseen in previous studies and yields promising results, capturing the context of the text and generating realistic good quality images of human faces.
引用
收藏
页码:78 / 90
页数:13
相关论文
共 50 条
  • [1] CLIP-guided StyleGAN Inversion for Text-driven Real Image Editing
    Baykal, Ahmet Canberk
    Anees, Abdul Basit
    Ceylan, Duygu
    Erdem, Erkut
    Erdem, Aykut
    Yuret, Deniz
    ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (05):
  • [2] Text guided image manipulation using LiT and StyleGAN2
    Todmal, Shantanu
    Hazra, Tanmoy
    2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [3] Facial Image Inpainting with Variational Autoencoder
    Tu, Ching-Ting
    Chen, Yi-Fu
    2019 2ND INTERNATIONAL CONFERENCE OF INTELLIGENT ROBOTIC AND CONTROL ENGINEERING (IRCE 2019), 2019, : 119 - 122
  • [4] StyleGAN-based CLIP-guided Image Shape Manipulation
    Qian, Yuchen
    Yamamoto, Kohei
    Yanai, Keiji
    19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 162 - 166
  • [5] StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
    Gal, Rinon
    Patashnik, Or
    Maron, Haggai
    Bermano, Amit H.
    Chechik, Gal
    Cohen-Or, Daniel
    ACM TRANSACTIONS ON GRAPHICS, 2022, 41 (04):
  • [6] Conditional Introspective Variational Autoencoder for Image Synthesis
    Zheng, Kun
    Cheng, Yafan
    Kang, Xiaojun
    Yao, Hong
    Tian, Tian
    IEEE ACCESS, 2020, 8 (08): : 153905 - 153913
  • [7] StyleAutoEncoder for Manipulating Image Attributes Using Pre-trained StyleGAN
    Bedychaj, Andrzej
    Tabor, Jacek
    Smieja, Marek
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PAKDD 2024, 2024, 14646 : 118 - 130
  • [8] Multi-digit Image Synthesis Using Recurrent Conditional Variational Autoencoder
    Sun, Haoze
    Xu, Weidi
    Deng, Chao
    Tan, Ying
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 375 - 380
  • [9] Anomaly detection in facial skin temperature using variational autoencoder
    Ayaka Masaki
    Kent Nagumo
    Bikash Lamsal
    Kosuke Oiwa
    Akio Nozawa
    Artificial Life and Robotics, 2021, 26 : 122 - 128
  • [10] Anomaly detection in facial skin temperature using variational autoencoder
    Masaki, Ayaka
    Nagumo, Kent
    Lamsal, Bikash
    Oiwa, Kosuke
    Nozawa, Akio
    ARTIFICIAL LIFE AND ROBOTICS, 2021, 26 (01) : 122 - 128