InstaFormer++: Multi-Domain Instance-Aware Image-to-Image Translation with Transformer

被引:0
|
作者
Soohyun Kim
Jongbeom Baek
Jihye Park
Eunjae Ha
Homin Jung
Taeyoung Lee
Seungryong Kim
机构
[1] Korea University,
[2] Hanwha Systems Co.,undefined
[3] Ltd,undefined
来源
关键词
Image-to-image translation; GANs; Instance-aware image-to-image translation; Vision and language;
D O I
暂无
中图分类号
学科分类号
摘要
We present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. By considering extracted content features from an image as visual tokens, our model discovers global consensus of content features by considering context information through self-attention module of Transformers. By augmenting such tokens with an instance-level feature extracted from the content feature with respect to bounding box information, our framework is capable of learning an interaction between object instances and the global image, thus boosting the instance-awareness. We replace layer normalization (LayerNorm) in standard Transformers with adaptive instance normalization (AdaIN) to enable a multi-modal translation with style codes. In addition, to improve the instance-awareness and translation quality at object regions, we present an instance-level content contrastive loss defined between input and translated image. Although competitive performance can be attained by InstaFormer, it may face some limitations, i.e., limited scalability in handling multiple domains, and reliance on domain annotations. To overcome this, we propose InstaFormer++ as an extension of Instaformer, which enables multi-domain translation in instance-aware image translation for the first time. We propose to obtain pseudo domain label by leveraging a list of candidate domain labels in a text format and pretrained vision-language model. We conduct experiments to demonstrate the effectiveness of our methods over the latest methods and provide extensive ablation studies.
引用
收藏
页码:1167 / 1186
页数:19
相关论文
共 50 条
  • [41] Image-to-image translation for cross-domain disentanglement
    Gonzalez-Garcia, Abel
    van de Weijer, Joost
    Bengio, Yoshua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [42] AU-GAN: Attention U-Net Based on a Built-In Attention for Multi-domain Image-to-Image Translation
    Xu, Caie
    Gan, Jin
    Wu, Mingyang
    Ni, Dandan
    WEB AND BIG DATA. APWEB-WAIM 2022 INTERNATIONAL WORKSHOPS, KGMA 2022, SEMIBDMA 2022, DEEPLUDA 2022, 2023, 1784 : 202 - 218
  • [43] TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation
    Wu, Wayne
    Cao, Kaidi
    Li, Cheng
    Qian, Chen
    Loy, Chen Change
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8004 - 8013
  • [44] Semi-supervised Task Aware Image-to-Image Translation
    Muetze, Annika
    Rottmann, Matthias
    Gottschalk, Hanno
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 98 - 122
  • [45] Gated SwitchGAN for Multi-Domain Facial Image Translation
    Zhang, Xiaokang
    Zhu, Yuanlue
    Chen, Wenting
    Liu, Wenshuang
    Shen, Linlin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1990 - 2003
  • [46] Artistic Instance-Aware Image Filtering by Convolutional Neural Networks
    Tehrani, Milad
    Bagheri, Mahnoosh
    Ahmadi, Mahdi
    Norouzi, Alireza
    Karimi, Nader
    Samavi, Shadrokh
    2018 9TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2018, : 710 - 714
  • [47] Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
    Huang, Yan
    Wang, Wei
    Wang, Liang
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7254 - 7262
  • [48] Joint EM Image Denoising and Segmentation with Instance-Aware Interaction
    Wang, Zhicheng
    Li, Jiacheng
    Chen, Yinda
    Shou, Jiateng
    Deng, Shiyu
    Huang, Wei
    Xiong, Zhiwei
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 403 - 413
  • [49] Multi-cropping contrastive learning and domain consistency for unsupervised image-to-image translation
    Zhao, Chen
    Cai, Wei-Ling
    Yuan, Zheng
    Hu, Cheng-Wei
    IET IMAGE PROCESSING, 2025, 19 (01)
  • [50] Domain Bridge for Unpaired Image-to-Image Translation and Unsupervised Domain Adaptation
    Pizzati, Fabio
    de Charette, Raoul
    Zaccaria, Michela
    Cerri, Pietro
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2979 - 2987