Scene graph fusion and negative sample generation strategy for image-text matching

被引:0
|
作者
Wang, Liqin [1 ,2 ,3 ]
Yang, Pengcheng [1 ]
Wang, Xu [1 ,2 ,3 ]
Xu, Zhihong [1 ,2 ,3 ]
Dong, Yongfeng [1 ,2 ,3 ]
机构
[1] Hebei Univ Technol, Sch Artificial Intelligence & Data Sci, Tianjin 300401, Peoples R China
[2] Hebei Prov Key Lab Big Data Calculat, Tianjin 300401, Peoples R China
[3] Hebei Data Driven Ind Intelligent Engn Res Ctr, Tianjin 300401, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 01期
关键词
Image-text matching; Scene graph fusion; Explicit modeling; Negative sample;
D O I
10.1007/s11227-024-06652-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of image-text matching, the scene graph-based approach is commonly employed to detect semantic associations between entities in cross-modal information, hence improving cross-modal interaction by capturing more fine-grained associations. However, the associations between images and texts are often implicitly modeled, resulting in a semantic gap between image and text information. To address the lack of cross-modal information integration and explicitly model fine-grained semantic information in images and texts, we propose a scene graph fusion and negative sample generation strategy for image-text matching(SGFNS). Furthermore, to enhance the expression ability of the insignificant features of similar images in image-text matching, we propose a negative sample generation strategy, and introduce an extra loss function to effectively incorporate negative samples to enhance the training process. In experiments, we verify the effectiveness of our model compared with current state-of-the-art models using scene graph directly.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Learning hierarchical embedding space for image-text matching
    Sun, Hao
    Qin, Xiaolin
    Liu, Xiaojing
    INTELLIGENT DATA ANALYSIS, 2024, 28 (03) : 647 - 665
  • [42] A NEIGHBOR-AWARE APPROACH FOR IMAGE-TEXT MATCHING
    Liu, Chunxiao
    Mao, Zhendong
    Zang, Wenyu
    Wang, Bin
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3970 - 3974
  • [43] Similarity Contrastive Capsule Transformation for Image-Text Matching
    Zhang, Bin
    Sun, Ximin
    Li, Xiaoming
    Wang, Shuai
    Liu, Dan
    Jia, Jiangkai
    2023 9TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND ROBOTICS ENGINEERING, ICMRE, 2023, : 84 - 90
  • [44] Transformer Reasoning Network for Image-Text Matching and Retrieval
    Messina, Nicola
    Falchi, Fabrizio
    Esuli, Andrea
    Amato, Giuseppe
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5222 - 5229
  • [45] Plug-and-Play Regulators for Image-Text Matching
    Diao, Haiwen
    Zhang, Ying
    Liu, Wei
    Ruan, Xiang
    Lu, Huchuan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2322 - 2334
  • [46] Synthesizing Counterfactual Samples for Effective Image-Text Matching
    Wei, Hao
    Wang, Shuhui
    Han, Xinzhe
    Xue, Zhe
    Ma, Bin
    Wei, Xiaoming
    Wei, Xiaolin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4355 - 4364
  • [47] Position Focused Attention Network for Image-Text Matching
    Wang, Yaxiong
    Yang, Hao
    Qian, Xueming
    Ma, Lin
    Lu, Jing
    Li, Biao
    Fan, Xin
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3792 - 3798
  • [48] Composing Object Relations and Attributes for Image-Text Matching
    Pham, Khoi
    Huynh, Chuong
    Lim, Ser-Nam
    Shrivastava, Abhinav
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14354 - 14363
  • [49] Generative label fused network for image-text matching
    Zhao, Guoshuai
    Zhang, Chaofeng
    Shang, Heng
    Wang, Yaxiong
    Zhu, Li
    Qian, Xueming
    KNOWLEDGE-BASED SYSTEMS, 2023, 263
  • [50] Text-Image Scene Graph Fusion for Multimodal Named Entity Recognition
    Cheng J.
    Long K.
    Zhang S.
    Zhang T.
    Ma L.
    Cheng S.
    Guo Y.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2828 - 2839