Crossing the Gap: Domain Generalization for Image Captioning

被引：3

作者：

Ren, Yuchen ^{[1
,2
]}

Mao, Zhendong ^{[1
,3
]}

Fang, Shancheng ^{[1
]}

Lu, Yan ^{[2
]}

He, Tong ^{[2
]}

Du, Hao ^{[1
]}

Zhang, Yongdong ^{[1
,3
]}

Ouyang, Wanli ^{[2
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China

[3] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

10.1109/CVPR52729.2023.00281

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing image captioning methods are under the assumption that the training and testing data are from the same domain or that the data from the target domain (i.e., the domain that testing data lie in) are accessible. However, this assumption is invalid in real-world applications where the data from the target domain is inaccessible. In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process. We first construct a benchmark dataset for DGIC, which helps us to investigate models' domain generalization (DG) ability on unseen domains. With the support of the new benchmark, we further propose a new framework called language-guided semantic metric learning (LSML) for the DGIC setting. Experiments on multiple datasets demonstrate the challenge of the task and the effectiveness of our newly proposed benchmark and LSML framework.

引用

页码：2871 / 2880

页数：10

共 50 条

[21] Cross-Domain Infrared Image Classification via Image-to-Image Translation and Deep Domain Generalization
Guo, Zhao-Rui
Niu, Jia-Wei
Liu, Zhun-Ga
2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 487 - 493
[22] Addressing Performance Inconsistency in Domain Generalization for Image Classification
Stirling, Jamie
Al Moubayed, Noura
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[23] Domain generalization for mammographic image analysis with contrastive learning
Li, Zheren
Cui, Zhiming
Zhang, Lichi
Wang, Sheng
Lei, Chenjin
Ouyang, Xi
Chen, Dongdong
Zhao, Xiangyu
Liu, Chunling
Liu, Zaiyi
Gu, Yajia
Shen, Dinggang
Cheng, Jie-Zhi
Computers in Biology and Medicine, 2025, 185
[24] FOOLED BY IMAGINATION: ADVERSARIAL ATTACK TO IMAGE CAPTIONING VIA PERTURBATION IN COMPLEX DOMAIN
Zhang, Shaofeng
Wang, Zheng
Xu, Xing
Guan, Xiang
Yang, Yang
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[25] Coastal Image Captioning
Yang, Qiaoqiao
Wang, Guangxing
Zhang, Xiaoyu
Grecos, Christos
Ren, Peng
JOURNAL OF COASTAL RESEARCH, 2020, : 145 - 150
[26] Convolutional Image Captioning
Aneja, Jyoti
Deshpande, Aditya
Schwing, Alexander G.
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5561 - 5570
[27] Unsupervised Image Captioning
Feng, Yang
Ma, Lin
Liu, Wei
Luo, Jiebo
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4120 - 4129
[28] COLLOQUIAL IMAGE CAPTIONING
Ge, Xuri
Chen, Fuhai
Shen, Chen
Ji, Rongrong
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 356 - 361
[29] Automated image captioning
Puscasiu, Adela
Fanca, Alexandra
Gota, Dan-Ioan
Valean, Honoriu
PROCEEDINGS OF 2020 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION, QUALITY AND TESTING, ROBOTICS (AQTR), 2020, : 361 - 366
[30] Automatic image captioning
Pan, JY
Yang, HJ
Duygulu, P
Faloutsos, C
2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1987 - 1990

← 1 2 3 4 5 →