Multilateral Semantic Relations Modeling for Image Text Retrieval

被引:14
|
作者
Wang, Zheng [1 ,3 ]
Gaol, Zhenwei [1 ]
Guol, Kangshuai [1 ]
Yang, Yang [1 ]
Wang, Xiaorning [1 ]
Shen, Heng Tao [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] UESTC Guangdong, Inst Elect & Informat Engn, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.00277
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text retrieval is a fundamental task to bridge vision and language by exploiting various strategies to fine-grained alignment between regions and words. This is still tough mainly because of one-to-many correspondence, where a set of matches from another modality can be accessed by a random query. While existing solutions to this problem including multi-point mapping, probabilistic distribution, and geometric embedding have made promising progress, one-to-many correspondence is still under-explored. In this work, we develop a Multilateral Semantic Relations Modeling (termed MSRM) for image-text retrieval to capture the one-to-many correspondence between multiple samples and a given query via hypergraph modeling. Specifically, a given query is first mapped as a probabilistic embedding to learn its true semantic distribution based on Mahalanobis distance. Then each candidate instance in a mini-batch is regarded as a hypergraph node with its mean semantics while a Gaussian query is modeled as a hyperedge to capture the semantic correlations beyond the pair between candidate points and the query. Comprehensive experimental results on two widely used datasets demonstrate that our MSRM method can outperform state-of-the-art methods in the settlement of multiple matches while still maintaining the comparable performance of instance-level matching.
引用
收藏
页码:2830 / 2839
页数:10
相关论文
共 50 条
  • [21] Semantic Completion: Enhancing Image-Text Retrieval with Information Extraction and Compression
    Chen, Xue
    Guo, Yi
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT IV, PAKDD 2024, 2024, 14648 : 59 - 71
  • [22] Image-Text Retrieval With Cross-Modal Semantic Importance Consistency
    Liu, Zejun
    Chen, Fanglin
    Xu, Jun
    Pei, Wenjie
    Lu, Guangming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2465 - 2476
  • [23] Commonsense-Guided Semantic and Relational Consistencies for Image-Text Retrieval
    Li, Wenhui
    Yang, Song
    Li, Qiang
    Li, Xuanya
    Liu, An-An
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1867 - 1880
  • [24] USER: Unified Semantic Enhancement With Momentum Contrast for Image-Text Retrieval
    Zhang, Yan
    Ji, Zhong
    Wang, Di
    Pang, Yanwei
    Li, Xuelong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 595 - 609
  • [25] A semantic modeling approach for medical image semantic retrieval using hybrid Bayesian networks
    Lin, Chun-Yi
    Yin, Jun-Xun
    Gao, Xue
    Chen, Jian-Yu
    Qin, Pei
    ISDA 2006: SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 2, 2006, : 482 - +
  • [26] Learning Dual Semantic Relations With Graph Attention for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    Cheng, Qingrong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (07) : 2866 - 2879
  • [27] Linguistic Approach to Modeling of Coronary Arteries in Semantic Techniques of Image Retrieval
    Trzupek, Miroslaw
    Ogiela, Marek R.
    2014 NINTH INTERNATIONAL CONFERENCE ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING (3PGCIC), 2014, : 295 - 299
  • [28] Semantic Modeling of Natural Scenes for Content-Based Image Retrieval
    Julia Vogel
    Bernt Schiele
    International Journal of Computer Vision, 2007, 72 : 133 - 157
  • [29] A context-aware semantic modeling framework for efficient image retrieval
    K. S. Arun
    V. K. Govindan
    International Journal of Machine Learning and Cybernetics, 2017, 8 : 1259 - 1285
  • [30] Semantic modeling of natural scenes for content-based image retrieval
    Vogel, Julia
    Schiele, Bernt
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2007, 72 (02) : 133 - 157