Scene Graph Semantic Inference for Image and Text Matching

被引：15

作者：

Pei, Jiaming ^{[1
]}

Zhong, Kaiyang ^{[2
]}

Yu, Zhi ^{[3
]}

Wang, Lukun ^{[4
]}

Lakshmanna, Kuruva ^{[5
]}

机构：

[1] Univ Sydney, Sch Comp Sci, Sydney, NSW 2006, Australia

[2] Southwestern Univ Finance & Econ, Sch Comp & Artificial Intelligence, Sichuan 610030, Peoples R China

[3] Chongqing Univ, Sch Microelect & Commun Engn, Chongqing 40044, Peoples R China

[4] Shandong Univ Sci & Technol, Coll Intelligent equipment, Qingdao 271019, Peoples R China

[5] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2023年 / 22卷 / 05期

关键词：

Image and text matching; scene graph; semantic inference;

D O I：

10.1145/3563390

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the rapid development of information technology, image and text data have increased dramatically. Image and text matching techniques enable computers to understand information from both visual and text modalities and match them based on semantic content. Existing methods focus on visual and textual object co-occurrence statistics and learning coarse-level associations. However, the lack of intramodal semantic inference leads to the failure of fine-level association between modalities. Scene graphs can capture the interactions between visual and textual objects and model intramodal semantic associations, which are crucial for the understanding of scenes contained in images and text. In this article, we propose a novel scene graph semantic inference network (SGSIN) for image and text matching that effectively learns fine-level semantic information in vision and text to facilitate bridging cross-modal discrepancies. Specifically, we design two matching modules and construct scene graphs within each matching module for aggregating neighborhood information to refine the semantic representation of each object and achieve fine-level alignment of visual and textualmodalities. We perform extended experiments in Flickr30K andMSCOCO and achieve state-of-the-art results, which validate the advantages of our proposed approach.

引用

页数：23

共 50 条

[41] Progressive semantic aggregation and structured cognitive enhancement for image-text matching
Li, Mingyong
Gao, Yihua
Zhao, Honggang
Li, Ruiheng
Chen, Junyu
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 274
[42] Cross-Modal Attention With Semantic Consistence for Image-Text Matching
Xu, Xing
Wang, Tan
Yang, Yang
Zuo, Lin
Shen, Fumin
Shen, Heng Tao
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5412 - 5425
[43] Regularizing Visual Semantic Embedding With Contrastive Learning for Image-Text Matching
Liu, Yang
Liu, Hong
Wang, Huaqiu
Liu, Mengyuan
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1332 - 1336
[44] Cross-modal Semantic Interference Suppression for image-text matching
Yao, Tao
Peng, Shouyong
Sun, Yujuan
Sheng, Guorui
Fu, Haiyan
Kong, Xiangwei
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
[45] Cross-modal Semantic Interference Suppression for image-text matching
Yao, Tao
Peng, Shouyong
Sun, Yujuan
Sheng, Guorui
Fu, Haiyan
Kong, Xiangwei
Engineering Applications of Artificial Intelligence, 2024, 133
[46] Local Alignment with Global Semantic Consistence Network for Image-Text Matching
Li, Pengwei
Wu, Shihua
Lian, Zhichao
2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 652 - 657
[47] Multiple graph matching with Bayesian inference
Williams, ML
Wilson, RC
Hancock, ER
PATTERN RECOGNITION LETTERS, 1997, 18 (11-13) : 1275 - 1281
[48] Cross-modal Graph Matching Network for Image-text Retrieval
Cheng, Yuhao
Zhu, Xiaoguang
Qian, Jiuchao
Wen, Fei
Liu, Peilin
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
[49] SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Wang, Zhecan
You, Haoxuan
Li, Liunian Harold
Zareian, Alireza
Park, Suji
Liang, Yiqing
Chang, Kai-Wei
Chang, Shih-Fu
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 5914 - 5922
[50] Gradient-Based Graph Attention for Scene Text Image Super-resolution
Zhu, Xiangyuan
Guo, Kehua
Fang, Hui
Ding, Rui
Wu, Zheng
Schaefer, Gerald
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3861 - 3869

← 1 2 3 4 5 →