Crossing the Gap: Domain Generalization for Image Captioning

被引:3
|
作者
Ren, Yuchen [1 ,2 ]
Mao, Zhendong [1 ,3 ]
Fang, Shancheng [1 ]
Lu, Yan [2 ]
He, Tong [2 ]
Du, Hao [1 ]
Zhang, Yongdong [1 ,3 ]
Ouyang, Wanli [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
[3] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
10.1109/CVPR52729.2023.00281
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing image captioning methods are under the assumption that the training and testing data are from the same domain or that the data from the target domain (i.e., the domain that testing data lie in) are accessible. However, this assumption is invalid in real-world applications where the data from the target domain is inaccessible. In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process. We first construct a benchmark dataset for DGIC, which helps us to investigate models' domain generalization (DG) ability on unseen domains. With the support of the new benchmark, we further propose a new framework called language-guided semantic metric learning (LSML) for the DGIC setting. Experiments on multiple datasets demonstrate the challenge of the task and the effectiveness of our newly proposed benchmark and LSML framework.
引用
收藏
页码:2871 / 2880
页数:10
相关论文
共 50 条
  • [31] Image/video captioning
    Ushiku Y.
    Ushiku, Yoshitaka, 2018, Inst. of Image Information and Television Engineers (72): : 650 - 654
  • [32] Video Captioning based on Image Captioning as Subsidiary Content
    Vaishnavi, J.
    Narmatha, V
    2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
  • [33] CROSSING GAP
    不详
    BMJ-BRITISH MEDICAL JOURNAL, 1973, 1 (5856): : 760 - 760
  • [34] Pairwise Generalization Network for Cross-Domain Image Recognition
    Y. B. Liu
    T. T. Han
    Z. Gao
    Neural Processing Letters, 2020, 52 : 1023 - 1041
  • [35] SEMI-SUPERVISED DOMAIN GENERALIZATION FOR MEDICAL IMAGE ANALYSIS
    Zhang, Ruipeng
    Xu, Qinwei
    Huang, Chaoqin
    Zhang, Ya
    Wang, Yanfeng
    2022 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (IEEE ISBI 2022), 2022,
  • [36] Multi-component Image Translation for Deep Domain Generalization
    Rahman, Mohammad Mahfujur
    Fookes, Clinton
    Baktashmotlagh, Mahsa
    Sridharan, Sridha
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 579 - 588
  • [37] AADG: Automatic Augmentation for Domain Generalization on Retinal Image Segmentation
    Lyu, Junyan
    Zhang, Yiqi
    Huang, Yijin
    Lin, Li
    Cheng, Pujin
    Tang, Xiaoying
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (12) : 3699 - 3711
  • [38] Pairwise Generalization Network for Cross-Domain Image Recognition
    Liu, Y. B.
    Han, T. T.
    Gao, Z.
    NEURAL PROCESSING LETTERS, 2020, 52 (02) : 1023 - 1041
  • [39] Adversarial Consistency for Single Domain Generalization in Medical Image Segmentation
    Xu, Yanwu
    Xie, Shaoan
    Reynolds, Maxwell
    Ragoza, Matthew
    Gong, Mingming
    Batmanghelich, Kayhan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 671 - 681
  • [40] Crossing-Domain Generative Adversarial Networks for Unsupervised Multi-Domain Image-to-Image Translation
    Yang, Xuewen
    Xie, Dongliang
    Wang, Xin
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 374 - 382