Crossing the Gap: Domain Generalization for Image Captioning

被引:3
|
作者
Ren, Yuchen [1 ,2 ]
Mao, Zhendong [1 ,3 ]
Fang, Shancheng [1 ]
Lu, Yan [2 ]
He, Tong [2 ]
Du, Hao [1 ]
Zhang, Yongdong [1 ,3 ]
Ouyang, Wanli [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
[3] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
10.1109/CVPR52729.2023.00281
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing image captioning methods are under the assumption that the training and testing data are from the same domain or that the data from the target domain (i.e., the domain that testing data lie in) are accessible. However, this assumption is invalid in real-world applications where the data from the target domain is inaccessible. In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process. We first construct a benchmark dataset for DGIC, which helps us to investigate models' domain generalization (DG) ability on unseen domains. With the support of the new benchmark, we further propose a new framework called language-guided semantic metric learning (LSML) for the DGIC setting. Experiments on multiple datasets demonstrate the challenge of the task and the effectiveness of our newly proposed benchmark and LSML framework.
引用
收藏
页码:2871 / 2880
页数:10
相关论文
共 50 条
  • [1] Combine to Describe: Evaluating Compositional Generalization in Image Captioning
    Pantazopoulos, Georgios
    Suglia, Alessandro
    Eshghi, Arash
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 115 - 131
  • [2] Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning
    Shi, Zhan
    Liu, Hui
    Min, Martin Renqiang
    Malon, Christopher
    Li, Li Erran
    Zhu, Xiaodan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1990 - 2000
  • [3] Cross-domain personalized image captioning
    Cuirong Long
    Xiaoshan Yang
    Changsheng Xu
    Multimedia Tools and Applications, 2020, 79 : 33333 - 33348
  • [4] Cross-domain personalized image captioning
    Long, Cuirong
    Yang, Xiaoshan
    Xu, Changsheng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 33333 - 33348
  • [5] Domain-specific image captioning: a comprehensive review
    Sharma, Himanshu
    Padha, Devanand
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (02)
  • [6] Leveraging unpaired out -of -domain data for image captioning
    Chen, Xinghan
    Zhang, Mingxing
    Wang, Zheng
    Zuo, Lin
    Li, Bo
    Yang, Yang
    PATTERN RECOGNITION LETTERS, 2020, 132 : 132 - 140
  • [7] Dual Learning for Cross-domain Image Captioning
    Zhao, Wei
    Xu, Wei
    Yang, Min
    Ye, Jianbo
    Zhao, Zhou
    Feng, Yabing
    Qiao, Yu
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 29 - 38
  • [8] Multitask Learning for Cross-Domain Image Captioning
    Yang, Min
    Zhao, Wei
    Xu, Wei
    Feng, Yabing
    Zhao, Zhou
    Chen, Xiaojun
    Lei, Kai
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) : 1047 - 1061
  • [9] Cross-Domain Image Captioning with Discriminative Finetuning
    Dessi, Roberto
    Bevilacqua, Michele
    Gualdoni, Eleonora
    Carraz Rakotonirina, Nathanael
    Franzon, Francesca
    Baroni, Marco
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6935 - 6944
  • [10] Discriminative Style Learning for Cross-Domain Image Captioning
    Yuan, Jin
    Zhu, Shuai
    Huang, Shuyin
    Zhang, Hanwang
    Xiao, Yaoqiang
    Li, Zhiyong
    Wang, Meng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1723 - 1736