Text Guided Facial Image Synthesis Using StyleGAN and Variational Autoencoder Trained CLIP

被引:0
|
作者
Srinivasa, Anagha [1 ]
Praveen, Anjali [1 ]
Mavathur, Anusha [1 ]
Pothumarthi, Apurva [1 ]
Arya, Arti [1 ]
Agarwal, Pooja [1 ]
机构
[1] PES Univ, Bangalore 560100, Karnataka, India
关键词
Facial synthesis; Image manipulation; Vector Quantized Variational Autoencoders (VQVAE); Contrastive Language Image; Pre-training (CLIP); StyleGAN2;
D O I
10.1007/978-3-031-42508-0_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The average user may have little to no artistic skills but can describe what they envision in words. The user-provided text can be instantly transformed into a realistic image with the aid of generative neural architectures. This study intends to propose a novel approach to generate a facial image based on a user-given textual description. Prior works focus less on the manipulation aspects, hence the approach also emphasizes on manipulating and modifying the image generated, based on additional textual descriptions as required to further refine the expected face. It consists of a multi-level Vector-Quantized Variational Auto Encoder (VQVAE) that provides the image encodings, the Contrastive Language-Image Pre-Training (CLIP) module to interpret the texts and compute how close the final image encodings and the text are with each other within a common space, and a StyleGAN2 to decode and generate the required image output. The combination of such components within the architecture is unseen in previous studies and yields promising results, capturing the context of the text and generating realistic good quality images of human faces.
引用
收藏
页码:78 / 90
页数:13
相关论文
共 50 条
  • [31] Text-Guided Sketch-to-Photo Image Synthesis
    Osahor, Uche
    Nasrabadi, Nasser M.
    IEEE ACCESS, 2022, 10 : 98278 - 98289
  • [32] Correction to: An attempt to construct the individual model of daily facial skin temperature using variational autoencoder
    Ayaka Masaki
    Kent Nagumo
    Yuki Iwashita
    Kosuke Oiwa
    Akio Nozawa
    Artificial Life and Robotics, 2021, 26 : 525 - 525
  • [33] VIDEO QUESTION ANSWERING USING CLIP-GUIDED VISUAL-TEXT ATTENTION
    Ye, Shuhong
    Kong, Weikai
    Yao, Chenglin
    Ren, Jianfeng
    Jiang, Xudong
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 81 - 85
  • [34] Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning
    Yu, Haiyang
    Wang, Xiaocong
    Li, Bin
    Xue, Xiangyang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11909 - 11918
  • [35] Variational Autoencoder-Based Multiple Image Captioning Using a Caption Attention Map
    Kim, Boeun
    Shin, Saim
    Jung, Hyedong
    APPLIED SCIENCES-BASEL, 2019, 9 (13):
  • [36] HPC Storage Service Autotuning Using Variational-Autoencoder-Guided Asynchronous Bayesian Optimization
    Dorier, Matthieu
    Egele, Romain
    Balaprakash, Prasanna
    Koo, Jaehoon
    Madireddy, Sandeep
    Ramesh, Srinivasan
    Malony, Allen D.
    Ross, Rob
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 381 - 393
  • [37] Object Tracking of Aerial Imaging Device Image Using Variational Autoencoder and External Memory
    Park, Keunho
    Kim, Byoungjun
    Kim, Donghoon
    Kim, Seon-Hyeong
    Kim, Seo-jeong
    Jeong, Sunghwan
    2022 THIRTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2022, : 473 - 478
  • [38] Feature analysis for drowsiness detection based on facial skin temperature using variational autoencoder : a preliminary study
    Masaki, A.
    Nagumo, K.
    Oiwa, K.
    Nozawa, A.
    QUANTITATIVE INFRARED THERMOGRAPHY JOURNAL, 2023, 20 (05) : 304 - 318
  • [39] NNSPEECH: SPEAKER-GUIDED CONDITIONAL VARIATIONAL AUTOENCODER FOR ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH
    Zhao, Botao
    Zhang, Xulong
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4293 - 4297
  • [40] CLIP-Mesh: Generating textured meshes from text using pretrained image-text models
    Khalid, Nasir Mohammad
    Xie, Tianhao
    Belilovsky, Eugene
    Popa, Tiberiu
    PROCEEDINGS SIGGRAPH ASIA 2022, 2022,