Image Difference Captioning with Pre-training and Contrastive Learning

被引:0
|
作者
Yao, Linli [1 ]
Wang, Weiying [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language. The major challenges of this task lie in two aspects: 1) fine-grained visual differences that require learning stronger vision and language association and 2) high-cost of manual annotations that leads to limited supervised data. To address these challenges, we propose a new modeling framework following the pre-training-finetuning paradigm. Specifically, we design three self-supervised tasks and contrastive learning strategies to align visual differences and text descriptions at a fine-grained level. Moreover, we propose a data expansion strategy to utilize extra cross-task supervision information, such as data for fine-grained image classification, to alleviate the limitation of available supervised IDC data. Extensive experiments on two IDC benchmark datasets, CLEVR-Change and Birds-to-Words, demonstrate the effectiveness of the proposed modeling framework. The codes and models will be released at hfips://github.com/yaolinli/IDC.
引用
收藏
页码:3108 / 3116
页数:9
相关论文
共 50 条
  • [31] VideoTRM: Pre-training for Video Captioning Challenge 2020
    Chen, Jingwen
    Chao, Hongyang
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4605 - 4609
  • [32] Contrastive Code-Comment Pre-training
    Pei, Xiaohuan
    Liu, Daochang
    Qian, Luo
    Xu, Chang
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 398 - 407
  • [33] Contrastive Pre-training for Personalized Expert Finding
    Peng, Qiyao
    Liu, Hongtao
    Lv, Zhepeng
    Ng, Qingay
    Wang, Wenjun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15797 - 15806
  • [34] Temporal Contrastive Pre-Training for Sequential Recommendation
    Tian, Changxin
    Lin, Zihan
    Bian, Shuqing
    Wang, Jinpeng
    Zhao, Wayne Xin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1925 - 1934
  • [35] iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-training for Visual Recognition
    Wei, Yixuan
    Cao, Yue
    Zhang, Zheng
    Peng, Houwen
    Yao, Zhuliang
    Xie, Zhenda
    Hue, Han
    Guo, Baining
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2776 - 2786
  • [36] ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images
    Yang, Jiawei
    Chen, Hanbo
    Liang, Yuan
    Huang, Junzhou
    He, Lei
    Yao, Jianhua
    COMPUTER VISION, ECCV 2022, PT XXI, 2022, 13681 : 523 - 539
  • [37] MoleMCL: a multi-level contrastive learning framework for molecular pre-training
    Zhang, Xinyi
    Xu, Yanni
    Jiang, Changzhi
    Shen, Lian
    Liu, Xiangrong
    BIOINFORMATICS, 2024, 40 (04)
  • [38] Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation
    Zhan, Albert
    Zhao, Ruihan
    Pinto, Lerrel
    Abbeel, Pieter
    Laskin, Michael
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 4040 - 4047
  • [39] Contrastive Pre-training with Adversarial Perturbations for Check-in Sequence Representation Learning
    Gong, Letian
    Lin, Youfang
    Guo, Shengnan
    Lin, Yan
    Wang, Tianyi
    Zheng, Erwen
    Zhou, Zeyu
    Wan, Huaiyu
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 4276 - 4283
  • [40] Pre-training local and non-local geographical influences with contrastive learning
    Oh, Byungkook
    Suh, Ilhyun
    Cha, Kihoon
    Kim, Junbeom
    Park, Goeon
    Jeong, Sihyun
    KNOWLEDGE-BASED SYSTEMS, 2023, 259