InstaFormer++: Multi-Domain Instance-Aware Image-to-Image Translation with Transformer

被引：0

作者：

Soohyun Kim

Jongbeom Baek

Jihye Park

Eunjae Ha

Homin Jung

Taeyoung Lee

Seungryong Kim

机构：

[1] Korea University,

[2] Hanwha Systems Co.,undefined

[3] Ltd,undefined

来源：

International Journal of Computer Vision | 2024年 / 132卷

关键词：

Image-to-image translation; GANs; Instance-aware image-to-image translation; Vision and language;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. By considering extracted content features from an image as visual tokens, our model discovers global consensus of content features by considering context information through self-attention module of Transformers. By augmenting such tokens with an instance-level feature extracted from the content feature with respect to bounding box information, our framework is capable of learning an interaction between object instances and the global image, thus boosting the instance-awareness. We replace layer normalization (LayerNorm) in standard Transformers with adaptive instance normalization (AdaIN) to enable a multi-modal translation with style codes. In addition, to improve the instance-awareness and translation quality at object regions, we present an instance-level content contrastive loss defined between input and translated image. Although competitive performance can be attained by InstaFormer, it may face some limitations, i.e., limited scalability in handling multiple domains, and reliance on domain annotations. To overcome this, we propose InstaFormer++ as an extension of Instaformer, which enables multi-domain translation in instance-aware image translation for the first time. We propose to obtain pseudo domain label by leveraging a list of candidate domain labels in a text format and pretrained vision-language model. We conduct experiments to demonstrate the effectiveness of our methods over the latest methods and provide extensive ablation studies.

引用

页码：1167 / 1186

页数：19

共 50 条

[41] Image-to-image translation for cross-domain disentanglement
Gonzalez-Garcia, Abel
van de Weijer, Joost
Bengio, Yoshua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[42] AU-GAN: Attention U-Net Based on a Built-In Attention for Multi-domain Image-to-Image Translation
Xu, Caie
Gan, Jin
Wu, Mingyang
Ni, Dandan
WEB AND BIG DATA. APWEB-WAIM 2022 INTERNATIONAL WORKSHOPS, KGMA 2022, SEMIBDMA 2022, DEEPLUDA 2022, 2023, 1784 : 202 - 218
[43] TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation
Wu, Wayne
Cao, Kaidi
Li, Cheng
Qian, Chen
Loy, Chen Change
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8004 - 8013
[44] Semi-supervised Task Aware Image-to-Image Translation
Muetze, Annika
Rottmann, Matthias
Gottschalk, Hanno
COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 98 - 122
[45] Gated SwitchGAN for Multi-Domain Facial Image Translation
Zhang, Xiaokang
Zhu, Yuanlue
Chen, Wenting
Liu, Wenshuang
Shen, Linlin
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1990 - 2003
[46] Artistic Instance-Aware Image Filtering by Convolutional Neural Networks
Tehrani, Milad
Bagheri, Mahnoosh
Ahmadi, Mahdi
Norouzi, Alireza
Karimi, Nader
Samavi, Shadrokh
2018 9TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2018, : 710 - 714
[47] Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
Huang, Yan
Wang, Wei
Wang, Liang
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7254 - 7262
[48] Joint EM Image Denoising and Segmentation with Instance-Aware Interaction
Wang, Zhicheng
Li, Jiacheng
Chen, Yinda
Shou, Jiateng
Deng, Shiyu
Huang, Wei
Xiong, Zhiwei
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 403 - 413
[49] Multi-cropping contrastive learning and domain consistency for unsupervised image-to-image translation
Zhao, Chen
Cai, Wei-Ling
Yuan, Zheng
Hu, Cheng-Wei
IET IMAGE PROCESSING, 2025, 19 (01)
[50] Domain Bridge for Unpaired Image-to-Image Translation and Unsupervised Domain Adaptation
Pizzati, Fabio
de Charette, Raoul
Zaccaria, Michela
Cerri, Pietro
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2979 - 2987

← 1 2 3 4 5 →