Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

被引：15

作者：

Liu, Hongyu ^{[1
]}

Song, Yibing ^{[2
]}

Chen, Qifeng ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] Fudan Univ, AI Inst, Shanghai, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00971

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

GAN inversion and editing via StyleGAN maps an input image into the embedding spaces (W, W+, and F) to simultaneously maintain image fidelity and meaningful manipulation. From latent space W to extended latent space W+ to feature space F in StyleGAN, the editability of GAN inversion decreases while its reconstruction quality increases. Recent GAN inversion methods typically explore W+ and F rather than W to improve reconstruction fidelity while maintaining editability. As W+ and F are derived from W that is essentially the foundation latent space of StyleGAN, these GAN inversion methods focusing on W+ and F spaces could be improved by stepping back to W. In this work, we propose to first obtain the proper latent code in foundation latent space W. We introduce contrastive learning to align W and the image space for proper latent code discovery. Then, we leverage a cross-attention encoder to transform the obtained latent code in W into W+ and F, accordingly. Our experiments show that our exploration of the foundation latent space W improves the representation ability of latent codes in W+ and features in F, which yields state-of-the-art reconstruction fidelity and editability results on the standard benchmarks. Project page: https://kumapowerliu.github.io/CLCAE.

引用

页码：10072 / 10082

页数：11

共 50 条

[21] Style Transformer for Image Inversion and Editing
Hu, Xueqi
Huang, Qiusheng
Shi, Zhengyi
Li, Siyuan
Gao, Changxin
Sun, Li
Li, Qingli
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11327 - 11336
[22] Expressive Talking Head Video Encoding in StyleGAN2 Latent Space
Oorloff, Trevine
Yacoob, Yaser
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2990 - 2999
[23] A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance
Wu, Chen Henry
De la Torre, Fernando
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7344 - 7353
[24] Disentangled Image Attribute Editing in Latent Space via Mask-based Retention Loss
Ohaga, Shunya
Togo, Ren
Ogawa, Takahiro
Haseyama, Miki
PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022, 2022,
[25] Interpreting the Latent Space of GANs for Semantic Face Editing
Shen, Yujun
Gu, Jinjin
Tang, Xiaoou
Zhou, Bolei
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 9240 - 9249
[26] Conceptual and Hierarchical Latent Space Decomposition for Face Editing
Ozkan, Savas
Ozay, Mete
Robinson, Tom
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7214 - 7223
[27] Disentangling the latent space of GANs for semantic face editing
Niu, Yongjie
Zhou, Mingquan
Li, Zhan
PLOS ONE, 2023, 18 (10):
[28] Image-to-Image Translation With Disentangled Latent Vectors for Face Editing
Dalva, Yusuf
Pehlivan, Hamza
Hatipoglu, Oyku Irmak
Moran, Cansu
Dundar, Aysegul
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 14777 - 14788
[29] Wasserstein loss for Semantic Editing in the Latent Space of GANs
Doubinsky, Perla
Audebert, Nicolas
Crucianu, Michel
Le Borgne, Herve
20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023, 2023, : 55 - 60
[30] Brain-driven facial image reconstruction via StyleGAN inversion with improved identity consistency
Ren, Ziqi
Li, Jie
Wu, Lukun
Xue, Xuetong
Li, Xin
Yang, Fan
Jiao, Zhicheng
Gao, Xinbo
PATTERN RECOGNITION, 2024, 150

← 1 2 3 4 5 →