Text Guided Facial Image Synthesis Using StyleGAN and Variational Autoencoder Trained CLIP

被引:0
|
作者
Srinivasa, Anagha [1 ]
Praveen, Anjali [1 ]
Mavathur, Anusha [1 ]
Pothumarthi, Apurva [1 ]
Arya, Arti [1 ]
Agarwal, Pooja [1 ]
机构
[1] PES Univ, Bangalore 560100, Karnataka, India
关键词
Facial synthesis; Image manipulation; Vector Quantized Variational Autoencoders (VQVAE); Contrastive Language Image; Pre-training (CLIP); StyleGAN2;
D O I
10.1007/978-3-031-42508-0_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The average user may have little to no artistic skills but can describe what they envision in words. The user-provided text can be instantly transformed into a realistic image with the aid of generative neural architectures. This study intends to propose a novel approach to generate a facial image based on a user-given textual description. Prior works focus less on the manipulation aspects, hence the approach also emphasizes on manipulating and modifying the image generated, based on additional textual descriptions as required to further refine the expected face. It consists of a multi-level Vector-Quantized Variational Auto Encoder (VQVAE) that provides the image encodings, the Contrastive Language-Image Pre-Training (CLIP) module to interpret the texts and compute how close the final image encodings and the text are with each other within a common space, and a StyleGAN2 to decode and generate the required image output. The combination of such components within the architecture is unseen in previous studies and yields promising results, capturing the context of the text and generating realistic good quality images of human faces.
引用
收藏
页码:78 / 90
页数:13
相关论文
共 50 条
  • [41] Image-level trajectory inference of tau pathology using variational autoencoder for Flortaucipir PET
    Hong, J.
    Shi, K.
    Rominger, A.
    Choi, H.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2021, 48 (SUPPL 1) : S280 - S281
  • [42] Device Image-IV Mapping using Variational Autoencoder for Inverse Design and Forward Prediction
    Lu, Thomas
    Lu, Albert
    Wong, Hiu Yung
    2023 INTERNATIONAL CONFERENCE ON SIMULATION OF SEMICONDUCTOR PROCESSES AND DEVICES, SISPAD, 2023, : 161 - 164
  • [43] Image-level trajectory inference of tau pathology using variational autoencoder for Flortaucipir PET
    Jimin Hong
    Seung Kwan Kang
    Ian Alberts
    Jiaying Lu
    Raphael Sznitman
    Jae Sung Lee
    Axel Rominger
    Hongyoon Choi
    Kuangyu Shi
    European Journal of Nuclear Medicine and Molecular Imaging, 2022, 49 : 3061 - 3072
  • [44] Image-level trajectory inference of tau pathology using variational autoencoder for Flortaucipir PET
    Hong, Jimin
    Kang, Seung Kwan
    Alberts, Ian
    Lu, Jiaying
    Sznitman, Raphael
    Lee, Jae Sung
    Rominger, Axel
    Choi, Hongyoon
    Shi, Kuangyu
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2022, 49 (09) : 3061 - 3072
  • [45] Device Image-IV Mapping using Variational Autoencoder for Inverse Design and Forward Prediction
    Lu, Thomas
    Lu, Albert
    Wong, Hiu Yung
    International Conference on Simulation of Semiconductor Processes and Devices, SISPAD, 2023, : 161 - 164
  • [46] MGF-GAN: Multi Granularity Text Feature Fusion for Text-guided-Image Synthesis
    Wang, Xingfu
    Li, Xiangyu
    Hawbani, Ammar
    Zhao, Liang
    Alsamhi, Saeed Hamood
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1398 - 1403
  • [47] Radar Image Reconstruction from Raw ADC Data using Parametric Variational Autoencoder with Domain Adaptation
    Stephan, Michael
    Stadelmayer, Thomas
    Santra, Avik
    Fischer, Georg
    Weigel, Robert
    Lurz, Fabian
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9529 - 9536
  • [48] Multispectral Image Reconstruction From Color Images Using Enhanced Variational Autoencoder and Generative Adversarial Network
    Liu, Xu
    Gherbi, Abdelouahed
    Wei, Zhenzhou
    Li, Wubin
    Cheriet, Mohamed
    IEEE ACCESS, 2021, 9 : 1666 - 1679
  • [49] Deep Beacon: Image Storage and Broadcast over BLE Using Variational Autoencoder Generative Adversarial Network
    Shao, Chong
    Nirjon, Shahriar
    2018 14TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SENSOR SYSTEMS (DCOSS), 2018, : 147 - 154
  • [50] AttriDiffuser: Adversarially enhanced diffusion model for text-to-facial attribute image synthesis
    Song, Wenfeng
    Ye, Zhongyong
    Sun, Meng
    Hou, Xia
    Li, Shuai
    Hao, Aimin
    PATTERN RECOGNITION, 2025, 163