Scene graph fusion and negative sample generation strategy for image-text matching

被引：0

作者：

Wang, Liqin ^{[1
,2
,3
]}

Yang, Pengcheng ^{[1
]}

Wang, Xu ^{[1
,2
,3
]}

Xu, Zhihong ^{[1
,2
,3
]}

Dong, Yongfeng ^{[1
,2
,3
]}

机构：

[1] Hebei Univ Technol, Sch Artificial Intelligence & Data Sci, Tianjin 300401, Peoples R China

[2] Hebei Prov Key Lab Big Data Calculat, Tianjin 300401, Peoples R China

[3] Hebei Data Driven Ind Intelligent Engn Res Ctr, Tianjin 300401, Peoples R China

来源：

JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 01期

关键词：

Image-text matching; Scene graph fusion; Explicit modeling; Negative sample;

D O I：

10.1007/s11227-024-06652-2

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the field of image-text matching, the scene graph-based approach is commonly employed to detect semantic associations between entities in cross-modal information, hence improving cross-modal interaction by capturing more fine-grained associations. However, the associations between images and texts are often implicitly modeled, resulting in a semantic gap between image and text information. To address the lack of cross-modal information integration and explicitly model fine-grained semantic information in images and texts, we propose a scene graph fusion and negative sample generation strategy for image-text matching(SGFNS). Furthermore, to enhance the expression ability of the insignificant features of similar images in image-text matching, we propose a negative sample generation strategy, and introduce an extra loss function to effectively incorporate negative samples to enhance the training process. In experiments, we verify the effectiveness of our model compared with current state-of-the-art models using scene graph directly.

引用

页数：22

共 50 条

[41] Learning hierarchical embedding space for image-text matching
Sun, Hao
Qin, Xiaolin
Liu, Xiaojing
INTELLIGENT DATA ANALYSIS, 2024, 28 (03) : 647 - 665
[42] A NEIGHBOR-AWARE APPROACH FOR IMAGE-TEXT MATCHING
Liu, Chunxiao
Mao, Zhendong
Zang, Wenyu
Wang, Bin
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3970 - 3974
[43] Similarity Contrastive Capsule Transformation for Image-Text Matching
Zhang, Bin
Sun, Ximin
Li, Xiaoming
Wang, Shuai
Liu, Dan
Jia, Jiangkai
2023 9TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND ROBOTICS ENGINEERING, ICMRE, 2023, : 84 - 90
[44] Transformer Reasoning Network for Image-Text Matching and Retrieval
Messina, Nicola
Falchi, Fabrizio
Esuli, Andrea
Amato, Giuseppe
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5222 - 5229
[45] Plug-and-Play Regulators for Image-Text Matching
Diao, Haiwen
Zhang, Ying
Liu, Wei
Ruan, Xiang
Lu, Huchuan
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2322 - 2334
[46] Synthesizing Counterfactual Samples for Effective Image-Text Matching
Wei, Hao
Wang, Shuhui
Han, Xinzhe
Xue, Zhe
Ma, Bin
Wei, Xiaoming
Wei, Xiaolin
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4355 - 4364
[47] Position Focused Attention Network for Image-Text Matching
Wang, Yaxiong
Yang, Hao
Qian, Xueming
Ma, Lin
Lu, Jing
Li, Biao
Fan, Xin
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3792 - 3798
[48] Composing Object Relations and Attributes for Image-Text Matching
Pham, Khoi
Huynh, Chuong
Lim, Ser-Nam
Shrivastava, Abhinav
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14354 - 14363
[49] Generative label fused network for image-text matching
Zhao, Guoshuai
Zhang, Chaofeng
Shang, Heng
Wang, Yaxiong
Zhu, Li
Qian, Xueming
KNOWLEDGE-BASED SYSTEMS, 2023, 263
[50] Text-Image Scene Graph Fusion for Multimodal Named Entity Recognition
Cheng J.
Long K.
Zhang S.
Zhang T.
Ma L.
Cheng S.
Guo Y.
IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2828 - 2839

← 1 2 3 4 5 →