Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval

被引:0
|
作者
Li, Shenshen [1 ,2 ]
He, Chen [1 ,2 ]
Xu, Xing [1 ,2 ]
Shen, Fumin [1 ,2 ]
Yang, Yang [1 ,2 ]
Shen, Heng Tao [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text- based person retrieval aims at retrieving a specific pedestrian image from a gallery based on textual descriptions. The primary challenge is how to overcome the inherent heterogeneous modality gap in the situation of significant intra-class variation and minimal inter-class variation. Existing approaches commonly employ vision-language pre-training or attention mechanisms to learn appropriate crossmodal alignments from noise inputs. Despite commendable progress, current methods inevitably suffer from two defects: 1) Matching ambiguity, which mainly derives from unreliable matching pairs; 2) One-sided cross-modal alignments, stemming from the absence of exploring one-to-many correspondence, i.e., coarse-grained semantic alignment. These critical issues significantly deteriorate retrieval performance. To this end, we propose a novel framework termed Adaptive Uncertainty-based Learning (AUL) for text-based person retrieval from the uncertainty perspective. Specifically, our AUL framework consists of three key components: 1) Uncertainty-aware Matching Filtration that leverages Subjective Logic to effectively mitigate the disturbance of unreliable matching pairs and select high-confidence cross-modal matches for training; 2) Uncertainty-based Alignment Refinement, which not only simulates coarse-grained alignments by constructing uncertainty representations but also performs progressive learning to incorporate coarse- and fine-grained alignments properly; 3) Cross-modal Masked Modeling that aims at exploring more comprehensive relations between vision and language. Extensive experiments demonstrate that our AUL method consistently achieves state-of-the-art performance on three benchmark datasets in supervised, weakly supervised, and domain generalization settings. Our code is available at https://github.com/CFM-MSG/Code-AUL.
引用
收藏
页码:3172 / 3180
页数:9
相关论文
共 50 条
  • [1] Learning Semantic Polymorphic Mapping for Text-Based Person Retrieval
    Li, Jiayi
    Jiang, Min
    Kong, Jun
    Tao, Xuefeng
    Luo, Xi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10678 - 10691
  • [2] DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval
    Zhu, Aichun
    Wang, Zijie
    Li, Yifeng
    Wan, Xili
    Jin, Jing
    Wang, Tian
    Hu, Fangqiang
    Hua, Gang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 209 - 217
  • [3] Chatting with interactive memory for text-based person retrieval
    He, Chen
    Li, Shenshen
    Wang, Zheng
    Chen, Hua
    Shen, Fumin
    Xu, Xing
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [4] Causality-Inspired Invariant Representation Learning for Text-Based Person Retrieval
    Liu, Yu
    Qin, Guihe
    Chen, Haipeng
    Cheng, Zhiyong
    Yang, Xun
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 14052 - 14060
  • [5] Cross-Modal Uncertainty Modeling With Diffusion-Based Refinement for Text-Based Person Retrieval
    Li, Shenshen
    Xu, Xing
    He, Chen
    Shen, Fumin
    Yang, Yang
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2881 - 2893
  • [6] Uncertainty-based Continual Learning with Adaptive Regularization
    Ahn, Hongjoon
    Cha, Sungmin
    Lee, Donggyu
    Moon, Taesup
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [7] DCEL: Deep Cross-modal Evidential Learning for Text-Based Person Retrieval
    Li, Shenshen
    Xu, Xing
    Yang, Yang
    Shen, Fumin
    Mo, Yijun
    Li, Yujie
    Shen, Heng Tao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6292 - 6300
  • [8] SUM: Serialized Updating and Matching for text-based person retrieval
    Wang, Zijie
    Zhu, Aichun
    Xue, Jingyi
    Jiang, Daihong
    Liu, Chao
    Li, Yifeng
    Hu, Fangqiang
    KNOWLEDGE-BASED SYSTEMS, 2022, 248
  • [9] Exploring fonts as retrieval cues in text-based learning
    Krieglstein, Felix
    Jansen, Sebastian
    Meusel, Felicia
    Scheller, Nadine
    Schmitz, Manuel
    Wesenberg, Lukas
    Rey, Guenter Daniel
    ACTA PSYCHOLOGICA, 2024, 251
  • [10] Fine-grained Semantics-aware Representation Learning for Text-based Person Retrieval
    Wang, Di
    Yan, Feng
    Wang, Yifeng
    Zhao, Lin
    Liang, Xiao
    Zhong, Haodi
    Zhang, Ronghua
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 92 - 100