InstaFormer++: Multi-Domain Instance-Aware Image-to-Image Translation with Transformer

被引:0
|
作者
Soohyun Kim
Jongbeom Baek
Jihye Park
Eunjae Ha
Homin Jung
Taeyoung Lee
Seungryong Kim
机构
[1] Korea University,
[2] Hanwha Systems Co.,undefined
[3] Ltd,undefined
来源
关键词
Image-to-image translation; GANs; Instance-aware image-to-image translation; Vision and language;
D O I
暂无
中图分类号
学科分类号
摘要
We present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. By considering extracted content features from an image as visual tokens, our model discovers global consensus of content features by considering context information through self-attention module of Transformers. By augmenting such tokens with an instance-level feature extracted from the content feature with respect to bounding box information, our framework is capable of learning an interaction between object instances and the global image, thus boosting the instance-awareness. We replace layer normalization (LayerNorm) in standard Transformers with adaptive instance normalization (AdaIN) to enable a multi-modal translation with style codes. In addition, to improve the instance-awareness and translation quality at object regions, we present an instance-level content contrastive loss defined between input and translated image. Although competitive performance can be attained by InstaFormer, it may face some limitations, i.e., limited scalability in handling multiple domains, and reliance on domain annotations. To overcome this, we propose InstaFormer++ as an extension of Instaformer, which enables multi-domain translation in instance-aware image translation for the first time. We propose to obtain pseudo domain label by leveraging a list of candidate domain labels in a text format and pretrained vision-language model. We conduct experiments to demonstrate the effectiveness of our methods over the latest methods and provide extensive ablation studies.
引用
收藏
页码:1167 / 1186
页数:19
相关论文
共 50 条
  • [31] CT Kernel Conversion Using Multi-domain Image-to-Image Translation with Generator-Guided Contrastive Learning
    Choi, Changyong
    Jeong, Jiheon
    Lee, Sangyoon
    Lee, Sang Min
    Kim, Namkug
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT X, 2023, 14229 : 344 - 354
  • [32] TriGAN: image-to-image translation for multi-source domain adaptation
    Roy, Subhankar
    Siarohin, Aliaksandr
    Sangineto, Enver
    Sebe, Nicu
    Ricci, Elisa
    MACHINE VISION AND APPLICATIONS, 2021, 32 (01)
  • [33] TriGAN: image-to-image translation for multi-source domain adaptation
    Subhankar Roy
    Aliaksandr Siarohin
    Enver Sangineto
    Nicu Sebe
    Elisa Ricci
    Machine Vision and Applications, 2021, 32
  • [34] Image-to-Image Translation Between Tau Pathology and Neuronal Metabolism PET in Alzheimer Disease with Multi-domain Contrastive Learning
    Duong, Michael Tran
    Das, Sandhitsu R.
    Khandelwal, Pulkit
    Lyu, Xueying
    Xie, Long
    Yushkevich, Paul A.
    Wolk, David A.
    Nasrallah, Ilya M.
    MACHINE LEARNING IN CLINICAL NEUROIMAGING, MLCN 2023, 2023, 14312 : 3 - 13
  • [35] SWITCHGAN FOR MULTI-DOMAIN FAICAL IMAGE TRANSLATION
    Zhu, Yuanlue
    Bai, Mengchao
    Shen, Linlin
    Wen, Zhiwei
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1198 - 1203
  • [36] Quality-Aware Unpaired Image-to-Image Translation
    Chen, Lei
    Wu, Le
    Hu, Zhenzhen
    Wang, Meng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (10) : 2664 - 2674
  • [37] Geometry-Aware Eye Image-To-Image Translation
    Lu, Conny
    Zhang, Qian
    Krishnakumar, Kapil
    Chen, Jixu
    Fuchs, Henry
    Talathi, Sachin
    Liu, Kun
    2022 ACM SYMPOSIUM ON EYE TRACKING RESEARCH AND APPLICATIONS, ETRA 2022, 2022,
  • [38] MDT: UNSUPERVISED MULTI-DOMAIN IMAGE-TO-IMAGE TRANSLATOR BASED ON GENERATIVE ADVERSARIAL NETWORKS
    Lin, Ye
    Fu, Keren
    Ling, Shenggui
    Cheng, Peng
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 598 - 602
  • [39] 3D-Aware Multi-Class Image-to-Image Translation with NeRFs
    Li, Senmao
    van de Weijer, Joost
    Wang, Yaxing
    Khan, Fahad Shahbaz
    Liu, Meiqin
    Yang, Jian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12652 - 12662
  • [40] Unsupervised multi-domain image translation with domain representation learning
    Liu, Huajun
    Chen, Lei
    Sui, Haigang
    Zhu, Qing
    Lei, Dian
    Liu, Shubo
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 99