Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval

被引：0

作者：

Li, Shenshen ^{[1
,2
]}

He, Chen ^{[1
,2
]}

Xu, Xing ^{[1
,2
]}

Shen, Fumin ^{[1
,2
]}

Yang, Yang ^{[1
,2
]}

Shen, Heng Tao ^{[1
,2
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China

[2] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4 | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text- based person retrieval aims at retrieving a specific pedestrian image from a gallery based on textual descriptions. The primary challenge is how to overcome the inherent heterogeneous modality gap in the situation of significant intra-class variation and minimal inter-class variation. Existing approaches commonly employ vision-language pre-training or attention mechanisms to learn appropriate crossmodal alignments from noise inputs. Despite commendable progress, current methods inevitably suffer from two defects: 1) Matching ambiguity, which mainly derives from unreliable matching pairs; 2) One-sided cross-modal alignments, stemming from the absence of exploring one-to-many correspondence, i.e., coarse-grained semantic alignment. These critical issues significantly deteriorate retrieval performance. To this end, we propose a novel framework termed Adaptive Uncertainty-based Learning (AUL) for text-based person retrieval from the uncertainty perspective. Specifically, our AUL framework consists of three key components: 1) Uncertainty-aware Matching Filtration that leverages Subjective Logic to effectively mitigate the disturbance of unreliable matching pairs and select high-confidence cross-modal matches for training; 2) Uncertainty-based Alignment Refinement, which not only simulates coarse-grained alignments by constructing uncertainty representations but also performs progressive learning to incorporate coarse- and fine-grained alignments properly; 3) Cross-modal Masked Modeling that aims at exploring more comprehensive relations between vision and language. Extensive experiments demonstrate that our AUL method consistently achieves state-of-the-art performance on three benchmark datasets in supervised, weakly supervised, and domain generalization settings. Our code is available at https://github.com/CFM-MSG/Code-AUL.

引用

页码：3172 / 3180

页数：9

共 50 条

[31] Text-based Person Search via Multi-Granularity Embedding Learning
Wang, Chengji
Luo, Zhiming
Lin, Yaojin
Li, Shaozi
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1068 - 1074
[32] Text-based person search via cross-modal alignment learning
Ke, Xiao
Liu, Hao
Xu, Peirong
Lin, Xinru
Guo, Wenzhong
PATTERN RECOGNITION, 2024, 152
[33] LEARNING SEMANTIC-ALIGNED FEATURE REPRESENTATION FOR TEXT-BASED PERSON SEARCH
Li, Shiping
Cao, Min
Zhang, Min
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2724 - 2728
[34] Uncertainty-based Adaptive AXBT Sampling with SPOTS
DelBalzo, Donald R.
Klicka, Joseph
OCEANS 2009, VOLS 1-3, 2009, : 1162 - +
[35] Improving embedding learning by virtual attribute decoupling for text-based person search
Chengji Wang
Zhiming Luo
Yaojin Lin
Shaozi Li
Neural Computing and Applications, 2022, 34 : 5625 - 5647
[36] CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval
Wang, Zijie
Zhu, Aichun
Xue, Jingyi
Wan, Xili
Liu, Chao
Wang, Tian
Li, Yifeng
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5314 - 5322
[37] Text-Based Image Retrieval using Progressive Multi-Instance Learning
Li, Wen
Duan, Lixin
Xu, Dong
Tsang, Ivor Wai-Hung
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 2049 - 2055
[38] Improving Text-Based Person Retrieval by Excavating All-Round Information Beyond Color
Zhu, Aichun
Wang, Zijie
Xue, Jingyi
Wan, Xili
Jin, Jing
Wang, Tian
Snoussi, Hichem
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
[39] Multi-granularity Separation Network for Text-Based Person Retrieval with Bidirectional Refinement Regularization
Li, Shenshen
Xu, Xing
Shen, Fumin
Yang, Yang
PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 307 - 315
[40] Text-Based Audio Retrieval by Learning From Similarities Between Audio Captions
Xie, Huang
Khorrami, Khazar
Rasanen, Okko
Virtanen, Tuomas
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 221 - 225

← 1 2 3 4 5 →