Text Guided Facial Image Synthesis Using StyleGAN and Variational Autoencoder Trained CLIP

被引：0

作者：

Srinivasa, Anagha ^{[1
]}

Praveen, Anjali ^{[1
]}

Mavathur, Anusha ^{[1
]}

Pothumarthi, Apurva ^{[1
]}

Arya, Arti ^{[1
]}

Agarwal, Pooja ^{[1
]}

机构：

[1] PES Univ, Bangalore 560100, Karnataka, India

来源：

ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2023, PT II | 2023年 / 14126卷

关键词：

Facial synthesis; Image manipulation; Vector Quantized Variational Autoencoders (VQVAE); Contrastive Language Image; Pre-training (CLIP); StyleGAN2;

D O I：

10.1007/978-3-031-42508-0_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The average user may have little to no artistic skills but can describe what they envision in words. The user-provided text can be instantly transformed into a realistic image with the aid of generative neural architectures. This study intends to propose a novel approach to generate a facial image based on a user-given textual description. Prior works focus less on the manipulation aspects, hence the approach also emphasizes on manipulating and modifying the image generated, based on additional textual descriptions as required to further refine the expected face. It consists of a multi-level Vector-Quantized Variational Auto Encoder (VQVAE) that provides the image encodings, the Contrastive Language-Image Pre-Training (CLIP) module to interpret the texts and compute how close the final image encodings and the text are with each other within a common space, and a StyleGAN2 to decode and generate the required image output. The combination of such components within the architecture is unseen in previous studies and yields promising results, capturing the context of the text and generating realistic good quality images of human faces.

引用

页码：78 / 90

页数：13

共 50 条

[31] Text-Guided Sketch-to-Photo Image Synthesis
Osahor, Uche
Nasrabadi, Nasser M.
IEEE ACCESS, 2022, 10 : 98278 - 98289
[32] Correction to: An attempt to construct the individual model of daily facial skin temperature using variational autoencoder
Ayaka Masaki
Kent Nagumo
Yuki Iwashita
Kosuke Oiwa
Akio Nozawa
Artificial Life and Robotics, 2021, 26 : 525 - 525
[33] VIDEO QUESTION ANSWERING USING CLIP-GUIDED VISUAL-TEXT ATTENTION
Ye, Shuhong
Kong, Weikai
Yao, Chenglin
Ren, Jianfeng
Jiang, Xudong
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 81 - 85
[34] Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning
Yu, Haiyang
Wang, Xiaocong
Li, Bin
Xue, Xiangyang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11909 - 11918
[35] Variational Autoencoder-Based Multiple Image Captioning Using a Caption Attention Map
Kim, Boeun
Shin, Saim
Jung, Hyedong
APPLIED SCIENCES-BASEL, 2019, 9 (13):
[36] HPC Storage Service Autotuning Using Variational-Autoencoder-Guided Asynchronous Bayesian Optimization
Dorier, Matthieu
Egele, Romain
Balaprakash, Prasanna
Koo, Jaehoon
Madireddy, Sandeep
Ramesh, Srinivasan
Malony, Allen D.
Ross, Rob
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 381 - 393
[37] Object Tracking of Aerial Imaging Device Image Using Variational Autoencoder and External Memory
Park, Keunho
Kim, Byoungjun
Kim, Donghoon
Kim, Seon-Hyeong
Kim, Seo-jeong
Jeong, Sunghwan
2022 THIRTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2022, : 473 - 478
[38] Feature analysis for drowsiness detection based on facial skin temperature using variational autoencoder : a preliminary study
Masaki, A.
Nagumo, K.
Oiwa, K.
Nozawa, A.
QUANTITATIVE INFRARED THERMOGRAPHY JOURNAL, 2023, 20 (05) : 304 - 318
[39] NNSPEECH: SPEAKER-GUIDED CONDITIONAL VARIATIONAL AUTOENCODER FOR ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH
Zhao, Botao
Zhang, Xulong
Wang, Jianzong
Cheng, Ning
Xiao, Jing
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4293 - 4297
[40] CLIP-Mesh: Generating textured meshes from text using pretrained image-text models
Khalid, Nasir Mohammad
Xie, Tianhao
Belilovsky, Eugene
Popa, Tiberiu
PROCEEDINGS SIGGRAPH ASIA 2022, 2022,

← 1 2 3 4 5 →