Text Guided Facial Image Synthesis Using StyleGAN and Variational Autoencoder Trained CLIP

被引：0

作者：

Srinivasa, Anagha ^{[1
]}

Praveen, Anjali ^{[1
]}

Mavathur, Anusha ^{[1
]}

Pothumarthi, Apurva ^{[1
]}

Arya, Arti ^{[1
]}

Agarwal, Pooja ^{[1
]}

机构：

[1] PES Univ, Bangalore 560100, Karnataka, India

来源：

ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2023, PT II | 2023年 / 14126卷

关键词：

Facial synthesis; Image manipulation; Vector Quantized Variational Autoencoders (VQVAE); Contrastive Language Image; Pre-training (CLIP); StyleGAN2;

D O I：

10.1007/978-3-031-42508-0_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The average user may have little to no artistic skills but can describe what they envision in words. The user-provided text can be instantly transformed into a realistic image with the aid of generative neural architectures. This study intends to propose a novel approach to generate a facial image based on a user-given textual description. Prior works focus less on the manipulation aspects, hence the approach also emphasizes on manipulating and modifying the image generated, based on additional textual descriptions as required to further refine the expected face. It consists of a multi-level Vector-Quantized Variational Auto Encoder (VQVAE) that provides the image encodings, the Contrastive Language-Image Pre-Training (CLIP) module to interpret the texts and compute how close the final image encodings and the text are with each other within a common space, and a StyleGAN2 to decode and generate the required image output. The combination of such components within the architecture is unseen in previous studies and yields promising results, capturing the context of the text and generating realistic good quality images of human faces.

引用

页码：78 / 90

页数：13

共 50 条

[41] Image-level trajectory inference of tau pathology using variational autoencoder for Flortaucipir PET
Hong, J.
Shi, K.
Rominger, A.
Choi, H.
EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2021, 48 (SUPPL 1) : S280 - S281
[42] Device Image-IV Mapping using Variational Autoencoder for Inverse Design and Forward Prediction
Lu, Thomas
Lu, Albert
Wong, Hiu Yung
2023 INTERNATIONAL CONFERENCE ON SIMULATION OF SEMICONDUCTOR PROCESSES AND DEVICES, SISPAD, 2023, : 161 - 164
[43] Image-level trajectory inference of tau pathology using variational autoencoder for Flortaucipir PET
Jimin Hong
Seung Kwan Kang
Ian Alberts
Jiaying Lu
Raphael Sznitman
Jae Sung Lee
Axel Rominger
Hongyoon Choi
Kuangyu Shi
European Journal of Nuclear Medicine and Molecular Imaging, 2022, 49 : 3061 - 3072
[44] Image-level trajectory inference of tau pathology using variational autoencoder for Flortaucipir PET
Hong, Jimin
Kang, Seung Kwan
Alberts, Ian
Lu, Jiaying
Sznitman, Raphael
Lee, Jae Sung
Rominger, Axel
Choi, Hongyoon
Shi, Kuangyu
EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2022, 49 (09) : 3061 - 3072
[45] Device Image-IV Mapping using Variational Autoencoder for Inverse Design and Forward Prediction
Lu, Thomas
Lu, Albert
Wong, Hiu Yung
International Conference on Simulation of Semiconductor Processes and Devices, SISPAD, 2023, : 161 - 164
[46] MGF-GAN: Multi Granularity Text Feature Fusion for Text-guided-Image Synthesis
Wang, Xingfu
Li, Xiangyu
Hawbani, Ammar
Zhao, Liang
Alsamhi, Saeed Hamood
2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1398 - 1403
[47] Radar Image Reconstruction from Raw ADC Data using Parametric Variational Autoencoder with Domain Adaptation
Stephan, Michael
Stadelmayer, Thomas
Santra, Avik
Fischer, Georg
Weigel, Robert
Lurz, Fabian
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9529 - 9536
[48] Multispectral Image Reconstruction From Color Images Using Enhanced Variational Autoencoder and Generative Adversarial Network
Liu, Xu
Gherbi, Abdelouahed
Wei, Zhenzhou
Li, Wubin
Cheriet, Mohamed
IEEE ACCESS, 2021, 9 : 1666 - 1679
[49] Deep Beacon: Image Storage and Broadcast over BLE Using Variational Autoencoder Generative Adversarial Network
Shao, Chong
Nirjon, Shahriar
2018 14TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SENSOR SYSTEMS (DCOSS), 2018, : 147 - 154
[50] AttriDiffuser: Adversarially enhanced diffusion model for text-to-facial attribute image synthesis
Song, Wenfeng
Ye, Zhongyong
Sun, Meng
Hou, Xia
Li, Shuai
Hao, Aimin
PATTERN RECOGNITION, 2025, 163

← 1 2 3 4 5 →