Crossing the Gap: Domain Generalization for Image Captioning

被引：3

作者：

Ren, Yuchen ^{[1
,2
]}

Mao, Zhendong ^{[1
,3
]}

Fang, Shancheng ^{[1
]}

Lu, Yan ^{[2
]}

He, Tong ^{[2
]}

Du, Hao ^{[1
]}

Zhang, Yongdong ^{[1
,3
]}

Ouyang, Wanli ^{[2
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China

[3] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

10.1109/CVPR52729.2023.00281

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing image captioning methods are under the assumption that the training and testing data are from the same domain or that the data from the target domain (i.e., the domain that testing data lie in) are accessible. However, this assumption is invalid in real-world applications where the data from the target domain is inaccessible. In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process. We first construct a benchmark dataset for DGIC, which helps us to investigate models' domain generalization (DG) ability on unseen domains. With the support of the new benchmark, we further propose a new framework called language-guided semantic metric learning (LSML) for the DGIC setting. Experiments on multiple datasets demonstrate the challenge of the task and the effectiveness of our newly proposed benchmark and LSML framework.

引用

页码：2871 / 2880

页数：10

共 50 条

[41] Hierarchy Parsing for Image Captioning
Yao, Ting
Pan, Yingwei
Li, Yehao
Mei, Tao
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2621 - 2629
[42] Attention on Attention for Image Captioning
Huang, Lun
Wang, Wenmin
Chen, Jie
Wei, Xiao-Yong
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
[43] Distance Transformer for Image Captioning
Wang, Jiarong
Lu, Tongwei
Liu, Xuanxuan
Yang, Qi
2021 4TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION ENGINEERING (RCAE 2021), 2021, : 73 - 76
[44] Boosting Image Captioning with Attributes
Yao, Ting
Pan, Yingwei
Li, Yehao
Qiu, Zhaofan
Mei, Tao
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4904 - 4912
[45] Image Captioning with Memorized Knowledge
Chen, Hui
Ding, Guiguang
Lin, Zijia
Guo, Yuchen
Shan, Caifeng
Han, Jungong
COGNITIVE COMPUTATION, 2021, 13 (04) : 807 - 820
[46] Rotary Transformer for Image Captioning
Qiu, Yile
Zhu, Li
SECOND INTERNATIONAL CONFERENCE ON OPTICS AND IMAGE PROCESSING (ICOIP 2022), 2022, 12328
[47] Object Hallucination in Image Captioning
Rohrbach, Anna
Hendricks, Lisa Anne
Burns, Kaylee
Darrell, Trevor
Saenko, Kate
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4035 - 4045
[48] Entangled Transformer for Image Captioning
Li, Guang
Zhu, Linchao
Liu, Ping
Yang, Yi
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8927 - 8936
[49] Explainability for Medical Image Captioning
Beddiar, Djamila
Oussalah, Mourad
Tapio, Seppanen
2022 ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2022,
[50] Contrastive Learning for Image Captioning
Dai, Bo
Lin, Dahua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30

← 1 2 3 4 5 →