Text Guided Facial Image Synthesis Using StyleGAN and Variational Autoencoder Trained CLIP

被引:0
|
作者
Srinivasa, Anagha [1 ]
Praveen, Anjali [1 ]
Mavathur, Anusha [1 ]
Pothumarthi, Apurva [1 ]
Arya, Arti [1 ]
Agarwal, Pooja [1 ]
机构
[1] PES Univ, Bangalore 560100, Karnataka, India
关键词
Facial synthesis; Image manipulation; Vector Quantized Variational Autoencoders (VQVAE); Contrastive Language Image; Pre-training (CLIP); StyleGAN2;
D O I
10.1007/978-3-031-42508-0_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The average user may have little to no artistic skills but can describe what they envision in words. The user-provided text can be instantly transformed into a realistic image with the aid of generative neural architectures. This study intends to propose a novel approach to generate a facial image based on a user-given textual description. Prior works focus less on the manipulation aspects, hence the approach also emphasizes on manipulating and modifying the image generated, based on additional textual descriptions as required to further refine the expected face. It consists of a multi-level Vector-Quantized Variational Auto Encoder (VQVAE) that provides the image encodings, the Contrastive Language-Image Pre-Training (CLIP) module to interpret the texts and compute how close the final image encodings and the text are with each other within a common space, and a StyleGAN2 to decode and generate the required image output. The combination of such components within the architecture is unseen in previous studies and yields promising results, capturing the context of the text and generating realistic good quality images of human faces.
引用
收藏
页码:78 / 90
页数:13
相关论文
共 50 条
  • [21] MFECLIP: CLIP With Mapping-Fusion Embedding for Text-Guided Image Editing
    Wu, Fei
    Ma, Yongheng
    Jin, Hao
    Jing, Xiao-Yuan
    Jiang, Guo-Ping
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 116 - 120
  • [22] CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search
    Shamshad, Fahad
    Naseer, Muzammal
    Nandakumar, Karthik
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 20595 - 20605
  • [23] Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN
    Albahar, Badour
    Lu, Jingwan
    Yang, Jimei
    Shu, Zhixin
    Shechtman, Eli
    Huang, Jia-Bin
    ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (06):
  • [24] TCGIS: Text and Contour Guided Controllable Image Synthesis
    Zhang, Zhiqiang
    Yu, Wenxin
    Fan, Yibo
    Zhou, Jinjia
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON MULTIMEDIA CONTENT GENERATION AND EVALUATION, MCGE 2023: New Methods and Practice, 2023, : 75 - 79
  • [25] Text-Guided Customizable Image Synthesis and Manipulation
    Zhang, Zhiqiang
    Fu, Chen
    Weng, Wei
    Zhou, Jinjia
    APPLIED SCIENCES-BASEL, 2022, 12 (20):
  • [26] An attempt to construct the individual model of daily facial skin temperature using variational autoencoder
    Masaki, Ayaka
    Nagumo, Kent
    Iwashita, Yuki
    Oiwa, Kosuke
    Nozawa, Akio
    ARTIFICIAL LIFE AND ROBOTICS, 2021, 26 (04) : 488 - 493
  • [27] Variational autoencoder-based neural electrocardiogram synthesis trained by FEM-based heart simulator
    Nishikimi, Ryo
    Nakano, Masahiro
    Kashino, Kunio
    Tsukada, Shingo
    CARDIOVASCULAR DIGITAL HEALTH JOURNAL, 2024, 5 (01): : 19 - 28
  • [28] Text to Image Synthesis Using Stacked Conditional Variational Autoencoders and Conditional Generative Adversarial Networks
    Tibebu, Haileleol
    Malik, Aadin
    De Silva, Varuna
    INTELLIGENT COMPUTING, VOL 1, 2022, 506 : 560 - 580
  • [29] Magnetic State Generation using Hamiltonian Guided Variational Autoencoder with Spin Structure Stabilization
    Kwon, Hee Young
    Yoon, Han Gyu
    Park, Sung Min
    Lee, Doo Bong
    Choi, Jun Woo
    Won, Changyeon
    ADVANCED SCIENCE, 2021, 8 (11)
  • [30] Image Reconstruction Using Pre-Trained Autoencoder on Multimode Fiber Imaging System
    Li, Yuang
    Yu, Zhenming
    Chen, Yudi
    He, Tiantian
    Zhang, Jiaying
    Zhao, Ruining
    Xu, Kun
    IEEE PHOTONICS TECHNOLOGY LETTERS, 2020, 32 (13) : 779 - 782