Learning Semantic Polymorphic Mapping for Text-Based Person Retrieval

被引：1

作者：

Li, Jiayi ^{[1
]}

Jiang, Min ^{[1
]}

Kong, Jun ^{[2
]}

Tao, Xuefeng ^{[2
]}

Luo, Xi ^{[1
]}

机构：

[1] Jiangnan Univ, Engn Res Ctr Intelligent Technol Healthcare, Minist Educ, Wuxi 214122, Peoples R China

[2] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Semantics; Image reconstruction; Feature extraction; Task analysis; Cognition; Measurement; Representation learning; Text-based person retrieval; semantic polymorphism; implicit reasoning; modality alignment;

D O I：

10.1109/TMM.2024.3410129

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text-Based Person Retrieval (TBPR) aims to identify a particular individual within an extensive image gallery using text as the query. The principal challenge inherent in the TBPR task revolves around how to map cross-modal information to a potential common space and learn a generic representation. Previous methods have primarily focused on aligning singular text-image pairs, disregarding the inherent polymorphism within both images and natural language expressions for the same individual. Moreover, these methods have also ignored the impact of semantic polymorphism-based intra-modal data distribution on cross-modal matching. Recent methods employ cross-modal implicit information reconstruction to enhance inter-modal connections. However, the process of information reconstruction remains ambiguous. To address these issues, we propose the Learning Semantic Polymorphic Mapping (LSPM) framework, facilitated by the prowess of pre-trained cross-modal models. Firstly, to learn cross-modal information representations with better robustness, we design the Inter-modal Information Aggregation (Inter-IA) module to achieve cross-modal polymorphic mapping, fortifying the foundation of our information representations. Secondly, to attain a more concentrated intra-modal information representation based on semantic polymorphism, we design Intra-modal Information Aggregation (Intra-IA) module to further constrain the embeddings. Thirdly, to further explore the potential of cross-modal interactions within the model, we design the implicit reasoning module, Masked Information Guided Reconstruction (MIGR), with constraint guidance to elevate overall performance. Extensive experiments on both CUHK-PEDES and ICFG-PEDES datasets show that we achieve state-of-the-art results on Rank-1, mAP and mINP compared to existing methods.

引用

页码：10678 / 10691

页数：14

共 50 条

[1] Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval
Li, Shenshen
He, Chen
Xu, Xing
Shen, Fumin
Yang, Yang
Shen, Heng Tao
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3172 - 3180
[2] DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval
Zhu, Aichun
Wang, Zijie
Li, Yifeng
Wan, Xili
Jin, Jing
Wang, Tian
Hu, Fangqiang
Hua, Gang
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 209 - 217
[3] Causality-Inspired Invariant Representation Learning for Text-Based Person Retrieval
Liu, Yu
Qin, Guihe
Chen, Haipeng
Cheng, Zhiyong
Yang, Xun
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 14052 - 14060
[4] LEARNING SEMANTIC-ALIGNED FEATURE REPRESENTATION FOR TEXT-BASED PERSON SEARCH
Li, Shiping
Cao, Min
Zhang, Min
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2724 - 2728
[5] Chatting with interactive memory for text-based person retrieval
He, Chen
Li, Shenshen
Wang, Zheng
Chen, Hua
Shen, Fumin
Xu, Xing
MULTIMEDIA SYSTEMS, 2025, 31 (01)
[6] DCEL: Deep Cross-modal Evidential Learning for Text-Based Person Retrieval
Li, Shenshen
Xu, Xing
Yang, Yang
Shen, Fumin
Mo, Yijun
Li, Yujie
Shen, Heng Tao
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6292 - 6300
[7] SUM: Serialized Updating and Matching for text-based person retrieval
Wang, Zijie
Zhu, Aichun
Xue, Jingyi
Jiang, Daihong
Liu, Chao
Li, Yifeng
Hu, Fangqiang
KNOWLEDGE-BASED SYSTEMS, 2022, 248
[8] Fine-grained Semantics-aware Representation Learning for Text-based Person Retrieval
Wang, Di
Yan, Feng
Wang, Yifeng
Zhao, Lin
Liang, Xiao
Zhong, Haodi
Zhang, Ronghua
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 92 - 100
[9] Pedestrian-specific Bipartite-aware Similarity Learning for Text-based Person Retrieval
Shen, Fei
Shu, Xiangbo
Du, Xiaoyu
Tang, Jinhui
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8922 - 8931
[10] Exploring fonts as retrieval cues in text-based learning
Krieglstein, Felix
Jansen, Sebastian
Meusel, Felicia
Scheller, Nadine
Schmitz, Manuel
Wesenberg, Lukas
Rey, Guenter Daniel
ACTA PSYCHOLOGICA, 2024, 251

← 1 2 3 4 5 →