Image Difference Captioning with Pre-training and Contrastive Learning

被引:0
|
作者
Yao, Linli [1 ]
Wang, Weiying [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language. The major challenges of this task lie in two aspects: 1) fine-grained visual differences that require learning stronger vision and language association and 2) high-cost of manual annotations that leads to limited supervised data. To address these challenges, we propose a new modeling framework following the pre-training-finetuning paradigm. Specifically, we design three self-supervised tasks and contrastive learning strategies to align visual differences and text descriptions at a fine-grained level. Moreover, we propose a data expansion strategy to utilize extra cross-task supervision information, such as data for fine-grained image classification, to alleviate the limitation of available supervised IDC data. Extensive experiments on two IDC benchmark datasets, CLEVR-Change and Birds-to-Words, demonstrate the effectiveness of the proposed modeling framework. The codes and models will be released at hfips://github.com/yaolinli/IDC.
引用
收藏
页码:3108 / 3116
页数:9
相关论文
共 50 条
  • [41] Learning Transferable User Representations with Sequential Behaviors via Contrastive Pre-training
    Cheng, Mingyue
    Yuan, Fajie
    Liu, Qi
    Xin, Xin
    Chen, Enhong
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 51 - 60
  • [42] Data Determines Distributional Robustness in Contrastive Language-Image Pre-training (CLIP)
    Fang, Alex
    Ilharco, Gabriel
    Wortsman, Mitchell
    Wan, Yuhao
    Shankar, Vaishaal
    Dave, Achal
    Schmidt, Ludwig
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [43] RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-training
    Xie, Chen-Wei
    Sun, Siyang
    Xiong, Xiong
    Zheng, Yun
    Zhao, Deli
    Zhou, Jingren
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19265 - 19274
  • [44] Understanding and Mitigating the Soft Error of Contrastive Language-Image Pre-training Models
    Shi, Yihao
    Wang, Bo
    Luo, Shengbai
    Xue, Qingshan
    Zhang, Xueyi
    Ma, Sheng
    8TH INTERNATIONAL TEST CONFERENCE IN ASIA, ITC-ASIA 2024, 2024,
  • [45] Supervised contrastive pre-training models for mammography screening
    Cao, Zhenjie
    Deng, Zhuo
    Yang, Zhicheng
    Ma, Jie
    Ma, Lan
    JOURNAL OF BIG DATA, 2025, 12 (01)
  • [46] Contrastive Language-knowledge Graph Pre-training
    Yuan, Xiaowei
    Liu, Kang
    Wang, Yequan
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [47] VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
    Hu, Xiaowei
    Yin, Xi
    Lin, Kevin
    Zhang, Lei
    Gao, Jianfeng
    Wang, Lijuan
    Liu, Zicheng
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1575 - 1583
  • [48] Multi-Modal Contrastive Pre-training for Recommendation
    Liu, Zhuang
    Ma, Yunpu
    Schubert, Matthias
    Ouyang, Yuanxin
    Xiong, Zhang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 99 - 108
  • [49] Contrastive Ground-Level Image and Remote Sensing Pre-training Improves Representation Learning for Natural World Imagery
    Huynh, Andy, V
    Gillespie, Lauren E.
    Lopez-Saucedo, Jael
    Tang, Claire
    Sikand, Rohan
    Exposito-Alonso, Moises
    COMPUTER VISION - ECCV 2024, PT LXXX, 2025, 15138 : 173 - 190
  • [50] Contrastive semantic similarity learning for image captioning evaluation
    Zeng, Chao
    Kwong, Sam
    Zhao, Tiesong
    Wang, Hanli
    INFORMATION SCIENCES, 2022, 609 : 913 - 930