Cross-Domain Image Captioning with Discriminative Finetuning

被引：6

作者：

Dessi, Roberto ^{[1
]}

Bevilacqua, Michele ^{[2
]}

Gualdoni, Eleonora ^{[3
]}

Carraz Rakotonirina, Nathanael ^{[3
]}

Franzon, Francesca ^{[3
]}

Baroni, Marco ^{[4
]}

机构：

[1] UPF, Meta AI, Barcelona, Spain

[2] Samaya AI, Mountain View, CA USA

[3] UPF, Barcelona, Spain

[4] UPF, ICREA, Barcelona, Spain

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

欧洲研究理事会;

关键词：

D O I：

10.1109/CVPR52729.2023.00670

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Neural captioners are typically trained to mimic human-generated references without optimizing for any specific communication goal, leading to problems such as the generation of vague captions. In this paper, we show that fine-tuning an out-of-the-box neural captioner with a self-supervised discriminative communication objective helps to recover a plain, visually descriptive language that is more informative about image contents. Given a target image, the system must learn to produce a description that enables an out-of-the-box text-conditioned image retriever to identify such image among a set of candidates. We experiment with the popular ClipCap captioner, also replicating the main results with BLIP. In terms of similarity to ground-truth human descriptions, the captions emerging from discriminative finetuning lag slightly behind those generated by the non-finetuned model, when the latter is trained and tested on the same caption dataset. However, when the model is used without further tuning to generate captions for out-of-domain datasets, our discriminatively-finetuned captioner generates descriptions that resemble human references more than those produced by the same captioner wihtout finetuning. We further show that, on the Conceptual Captions dataset, discriminatively finetuned captions are more helpful than either vanilla ClipCap captions or ground-truth captions for human annotators tasked with an image discrimination task.(1)

引用

页码：6935 / 6944

页数：10

共 50 条

[41] Unsupervised Domain Adaptation for Cross-domain Histopathology Image Classification
Li, Xiangning
Pan, Chen
He, Lingmin
Li, Xinyu
Multimedia Tools and Applications, 2024, 83 (08) : 23311 - 23331
[42] Cross-Domain Interpolation for Unpaired Image-to-Image Translation
Lopez, Jorge
Mauricio, Antoni
Diaz, Jose
Camara, Guillermo
COMPUTER VISION SYSTEMS (ICVS 2019), 2019, 11754 : 542 - 551
[43] Robust adversarial discriminative domain adaptation for real-world cross-domain visual recognition
Yang, Jianfei
Zou, Han
Zhou, Yuxun
Xie, Lihua
NEUROCOMPUTING, 2021, 433 : 28 - 36
[44] Cross-domain collaborative learning for single image deraining
Pan, Zaiyu
Wang, Jun
Shen, Zhengwen
Han, Shuyu
Zhu, Jihong
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 211
[45] An Informative Logistic Regression for Cross-Domain Image Classification
Zhu, Guangtang
Yang, Hanfang
Lin, Lan
Zhou, Guichun
Zhou, Xiangdong
COMPUTER VISION SYSTEMS (ICVS 2015), 2015, 9163 : 147 - 156
[46] Pairwise Generalization Network for Cross-Domain Image Recognition
Y. B. Liu
T. T. Han
Z. Gao
Neural Processing Letters, 2020, 52 : 1023 - 1041
[47] Dual-Level Adaptive and Discriminative Knowledge Transfer for Cross-Domain Recognition
Meng, Min
Lan, Mengcheng
Yu, Jun
Wu, Jigang
Liu, Ligang
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2266 - 2279
[48] Deep sketch feature for cross-domain image retrieval
Wang, Xinggang
Duan, Xiong
Bai, Xiang
NEUROCOMPUTING, 2016, 207 : 387 - 397
[49] Cross-Domain Image Matching with Deep Feature Maps
Bailey Kong
James Supanc̆ic̆
Deva Ramanan
Charless C. Fowlkes
International Journal of Computer Vision, 2019, 127 : 1738 - 1750
[50] SDIT: Scalable and Diverse Cross-domain Image Translation
Wang, Yaxing
Gonzalez-Garcia, Abel
van de Weijer, Joost
Herranz, Luis
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1267 - 1276

← 1 2 3 4 5 →