Image Difference Captioning with Pre-training and Contrastive Learning

被引：0

作者：

Yao, Linli ^{[1
]}

Wang, Weiying ^{[1
]}

Jin, Qin ^{[1
]}

机构：

[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language. The major challenges of this task lie in two aspects: 1) fine-grained visual differences that require learning stronger vision and language association and 2) high-cost of manual annotations that leads to limited supervised data. To address these challenges, we propose a new modeling framework following the pre-training-finetuning paradigm. Specifically, we design three self-supervised tasks and contrastive learning strategies to align visual differences and text descriptions at a fine-grained level. Moreover, we propose a data expansion strategy to utilize extra cross-task supervision information, such as data for fine-grained image classification, to alleviate the limitation of available supervised IDC data. Extensive experiments on two IDC benchmark datasets, CLEVR-Change and Birds-to-Words, demonstrate the effectiveness of the proposed modeling framework. The codes and models will be released at hfips://github.com/yaolinli/IDC.

引用

页码：3108 / 3116

页数：9

共 50 条

[21] Dense Contrastive Learning for Self-Supervised Visual Pre-Training
Wang, Xinlong
Zhang, Rufeng
Shen, Chunhua
Kong, Tao
Li, Lei
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3023 - 3032
[22] VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning
Chen, Qibin
Lacomis, Jeremy
Schwartz, Edward J.
Neubig, Graham
Vasilescu, Bogdan
Le Goues, Claire
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2327 - 2339
[23] Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
You, Haoxuan
Zhou, Luowei
Xiao, Bin
Codella, Noel
Cheng, Yu
Xu, Ruochen
Chang, Shih-Fu
Yuan, Lu
COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 69 - 87
[24] Contrastive Learning for Self-Supervised Pre-Training of Point Cloud Segmentation Networks With Image Data
Janda, Andrej
Wagstaff, Brandon
Ng, Edwin G.
Kelly, Jonathan
2023 20TH CONFERENCE ON ROBOTS AND VISION, CRV, 2023, : 145 - 152
[25] Single-Stream Extractor Network With Contrastive Pre-Training for Remote-Sensing Change Captioning
Zhou, Qing
Gao, Junyu
Yuan, Yuan
Wang, Qi
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
[26] A Multi-view Molecular Pre-training with Generative Contrastive Learning
Liu, Yunwu
Zhang, Ruisheng
Yuan, Yongna
Ma, Jun
Li, Tongfeng
Yu, Zhixuan
INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2024, 16 (03) : 741 - 754
[27] Contrastive Learning With Enhancing Detailed Information for Pre-Training Vision Transformer
Liang, Zhuomin
Bai, Liang
Fan, Jinyu
Yang, Xian
Liang, Jiye
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 219 - 231
[28] Adversarial momentum-contrastive pre-training
Xu, Cong
Li, Dan
Yang, Min
PATTERN RECOGNITION LETTERS, 2022, 160 : 172 - 179
[29] Contrastive Pre-Training of GNNs on Heterogeneous Graphs
Jiang, Xunqiang
Lu, Yuanfu
Fang, Yuan
Shi, Chuan
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 803 - 812
[30] Using contrastive language-image pre-training for Thai recipe recommendation
Chuenbanluesuk, Thanatkorn
Plodprong, Voramate
Karoon, Weerasak
Rueangsri, Kotchakorn
Pojam, Suthasinee
Siriborvornratanakul, Thitirat
LANGUAGE RESOURCES AND EVALUATION, 2025,

← 1 2 3 4 5 →