Crossing the Gap: Domain Generalization for Image Captioning

被引:3
|
作者
Ren, Yuchen [1 ,2 ]
Mao, Zhendong [1 ,3 ]
Fang, Shancheng [1 ]
Lu, Yan [2 ]
He, Tong [2 ]
Du, Hao [1 ]
Zhang, Yongdong [1 ,3 ]
Ouyang, Wanli [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
[3] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
10.1109/CVPR52729.2023.00281
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing image captioning methods are under the assumption that the training and testing data are from the same domain or that the data from the target domain (i.e., the domain that testing data lie in) are accessible. However, this assumption is invalid in real-world applications where the data from the target domain is inaccessible. In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process. We first construct a benchmark dataset for DGIC, which helps us to investigate models' domain generalization (DG) ability on unseen domains. With the support of the new benchmark, we further propose a new framework called language-guided semantic metric learning (LSML) for the DGIC setting. Experiments on multiple datasets demonstrate the challenge of the task and the effectiveness of our newly proposed benchmark and LSML framework.
引用
收藏
页码:2871 / 2880
页数:10
相关论文
共 50 条
  • [41] Hierarchy Parsing for Image Captioning
    Yao, Ting
    Pan, Yingwei
    Li, Yehao
    Mei, Tao
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2621 - 2629
  • [42] Attention on Attention for Image Captioning
    Huang, Lun
    Wang, Wenmin
    Chen, Jie
    Wei, Xiao-Yong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
  • [43] Distance Transformer for Image Captioning
    Wang, Jiarong
    Lu, Tongwei
    Liu, Xuanxuan
    Yang, Qi
    2021 4TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION ENGINEERING (RCAE 2021), 2021, : 73 - 76
  • [44] Boosting Image Captioning with Attributes
    Yao, Ting
    Pan, Yingwei
    Li, Yehao
    Qiu, Zhaofan
    Mei, Tao
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4904 - 4912
  • [45] Image Captioning with Memorized Knowledge
    Chen, Hui
    Ding, Guiguang
    Lin, Zijia
    Guo, Yuchen
    Shan, Caifeng
    Han, Jungong
    COGNITIVE COMPUTATION, 2021, 13 (04) : 807 - 820
  • [46] Rotary Transformer for Image Captioning
    Qiu, Yile
    Zhu, Li
    SECOND INTERNATIONAL CONFERENCE ON OPTICS AND IMAGE PROCESSING (ICOIP 2022), 2022, 12328
  • [47] Object Hallucination in Image Captioning
    Rohrbach, Anna
    Hendricks, Lisa Anne
    Burns, Kaylee
    Darrell, Trevor
    Saenko, Kate
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4035 - 4045
  • [48] Entangled Transformer for Image Captioning
    Li, Guang
    Zhu, Linchao
    Liu, Ping
    Yang, Yi
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8927 - 8936
  • [49] Explainability for Medical Image Captioning
    Beddiar, Djamila
    Oussalah, Mourad
    Tapio, Seppanen
    2022 ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2022,
  • [50] Contrastive Learning for Image Captioning
    Dai, Bo
    Lin, Dahua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30