Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval

被引：0

作者：

Li, Shenshen ^{[1
,2
]}

He, Chen ^{[1
,2
]}

Xu, Xing ^{[1
,2
]}

Shen, Fumin ^{[1
,2
]}

Yang, Yang ^{[1
,2
]}

Shen, Heng Tao ^{[1
,2
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China

[2] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4 | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text- based person retrieval aims at retrieving a specific pedestrian image from a gallery based on textual descriptions. The primary challenge is how to overcome the inherent heterogeneous modality gap in the situation of significant intra-class variation and minimal inter-class variation. Existing approaches commonly employ vision-language pre-training or attention mechanisms to learn appropriate crossmodal alignments from noise inputs. Despite commendable progress, current methods inevitably suffer from two defects: 1) Matching ambiguity, which mainly derives from unreliable matching pairs; 2) One-sided cross-modal alignments, stemming from the absence of exploring one-to-many correspondence, i.e., coarse-grained semantic alignment. These critical issues significantly deteriorate retrieval performance. To this end, we propose a novel framework termed Adaptive Uncertainty-based Learning (AUL) for text-based person retrieval from the uncertainty perspective. Specifically, our AUL framework consists of three key components: 1) Uncertainty-aware Matching Filtration that leverages Subjective Logic to effectively mitigate the disturbance of unreliable matching pairs and select high-confidence cross-modal matches for training; 2) Uncertainty-based Alignment Refinement, which not only simulates coarse-grained alignments by constructing uncertainty representations but also performs progressive learning to incorporate coarse- and fine-grained alignments properly; 3) Cross-modal Masked Modeling that aims at exploring more comprehensive relations between vision and language. Extensive experiments demonstrate that our AUL method consistently achieves state-of-the-art performance on three benchmark datasets in supervised, weakly supervised, and domain generalization settings. Our code is available at https://github.com/CFM-MSG/Code-AUL.

引用

页码：3172 / 3180

页数：9

共 50 条

[21] EESSO: Exploiting Extreme and Smooth Signals via Omni-frequency learning for Text-based Person Retrieval
Xue, Jingyi
Wang, Zijie
Dong, Guan-Nan
Zhu, Aichun
IMAGE AND VISION COMPUTING, 2024, 142
[22] Text-based experiment retrieval in genomic databases
Sener, Duygu Dede
Ogul, Hasan
Basak, Selen
JOURNAL OF INFORMATION SCIENCE, 2024, 50 (05) : 1334 - 1344
[23] EFFECTS OF CENTRALITY ON RETRIEVAL OF TEXT-BASED CONCEPTS
ALBRECHT, JE
OBRIEN, EJ
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 1991, 17 (05) : 932 - 939
[24] Uncertainty-based modulation for lifelong learning
Brna, Andrew P.
Brown, Ryan C.
Connolly, Patrick M.
Simons, Stephen B.
Shimizu, Renee E.
Aguilar-Simon, Mario
NEURAL NETWORKS, 2019, 120 : 129 - 142
[25] A Scene Text-Based Image Retrieval System
Thuy Ho
Ngoc Ly
2012 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2012, : 79 - 84
[26] Linguistic Hallucination for Text-Based Video Retrieval
Fang, Sheng
Dang, Tiantian
Wang, Shuhui
Huang, Qingming
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9692 - 9705
[27] Text-Based Face Retrieval: Methods and Challenges
Deng, Yuchuan
Zhao, Qijun
Hu, Zhanpeng
Xu, Zixiang
BIOMETRIC RECOGNITION, CCBR 2023, 2023, 14463 : 150 - 159
[28] Improving embedding learning by virtual attribute decoupling for text-based person search
Wang, Chengji
Luo, Zhiming
Lin, Yaojin
Li, Shaozi
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (07): : 5625 - 5647
[29] Contrastive Transformer Learning With Proximity Data Generation for Text-Based Person Search
Wu, Hefeng
Chen, Weifeng
Liu, Zhibin
Chen, Tianshui
Chen, Zhiguang
Lin, Liang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7005 - 7016
[30] RaSa: Relation and Sensitivity Aware Representation Learning for Text-based Person Search
Bai, Yang
Cao, Min
Gao, Daming
Cao, Ziqiang
Chen, Chen
Fan, Zhenfeng
Nie, Liqiang
Zhang, Min
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 555 - 563

← 1 2 3 4 5 →