Image Difference Captioning with Pre-training and Contrastive Learning

被引:0
|
作者
Yao, Linli [1 ]
Wang, Weiying [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language. The major challenges of this task lie in two aspects: 1) fine-grained visual differences that require learning stronger vision and language association and 2) high-cost of manual annotations that leads to limited supervised data. To address these challenges, we propose a new modeling framework following the pre-training-finetuning paradigm. Specifically, we design three self-supervised tasks and contrastive learning strategies to align visual differences and text descriptions at a fine-grained level. Moreover, we propose a data expansion strategy to utilize extra cross-task supervision information, such as data for fine-grained image classification, to alleviate the limitation of available supervised IDC data. Extensive experiments on two IDC benchmark datasets, CLEVR-Change and Birds-to-Words, demonstrate the effectiveness of the proposed modeling framework. The codes and models will be released at hfips://github.com/yaolinli/IDC.
引用
收藏
页码:3108 / 3116
页数:9
相关论文
共 50 条
  • [21] Dense Contrastive Learning for Self-Supervised Visual Pre-Training
    Wang, Xinlong
    Zhang, Rufeng
    Shen, Chunhua
    Kong, Tao
    Li, Lei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3023 - 3032
  • [22] VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning
    Chen, Qibin
    Lacomis, Jeremy
    Schwartz, Edward J.
    Neubig, Graham
    Vasilescu, Bogdan
    Le Goues, Claire
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2327 - 2339
  • [23] Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
    You, Haoxuan
    Zhou, Luowei
    Xiao, Bin
    Codella, Noel
    Cheng, Yu
    Xu, Ruochen
    Chang, Shih-Fu
    Yuan, Lu
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 69 - 87
  • [24] Contrastive Learning for Self-Supervised Pre-Training of Point Cloud Segmentation Networks With Image Data
    Janda, Andrej
    Wagstaff, Brandon
    Ng, Edwin G.
    Kelly, Jonathan
    2023 20TH CONFERENCE ON ROBOTS AND VISION, CRV, 2023, : 145 - 152
  • [25] Single-Stream Extractor Network With Contrastive Pre-Training for Remote-Sensing Change Captioning
    Zhou, Qing
    Gao, Junyu
    Yuan, Yuan
    Wang, Qi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [26] A Multi-view Molecular Pre-training with Generative Contrastive Learning
    Liu, Yunwu
    Zhang, Ruisheng
    Yuan, Yongna
    Ma, Jun
    Li, Tongfeng
    Yu, Zhixuan
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2024, 16 (03) : 741 - 754
  • [27] Contrastive Learning With Enhancing Detailed Information for Pre-Training Vision Transformer
    Liang, Zhuomin
    Bai, Liang
    Fan, Jinyu
    Yang, Xian
    Liang, Jiye
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 219 - 231
  • [28] Adversarial momentum-contrastive pre-training
    Xu, Cong
    Li, Dan
    Yang, Min
    PATTERN RECOGNITION LETTERS, 2022, 160 : 172 - 179
  • [29] Contrastive Pre-Training of GNNs on Heterogeneous Graphs
    Jiang, Xunqiang
    Lu, Yuanfu
    Fang, Yuan
    Shi, Chuan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 803 - 812
  • [30] Using contrastive language-image pre-training for Thai recipe recommendation
    Chuenbanluesuk, Thanatkorn
    Plodprong, Voramate
    Karoon, Weerasak
    Rueangsri, Kotchakorn
    Pojam, Suthasinee
    Siriborvornratanakul, Thitirat
    LANGUAGE RESOURCES AND EVALUATION, 2025,