Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval

被引:0
|
作者
Li, Shenshen [1 ,2 ]
He, Chen [1 ,2 ]
Xu, Xing [1 ,2 ]
Shen, Fumin [1 ,2 ]
Yang, Yang [1 ,2 ]
Shen, Heng Tao [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text- based person retrieval aims at retrieving a specific pedestrian image from a gallery based on textual descriptions. The primary challenge is how to overcome the inherent heterogeneous modality gap in the situation of significant intra-class variation and minimal inter-class variation. Existing approaches commonly employ vision-language pre-training or attention mechanisms to learn appropriate crossmodal alignments from noise inputs. Despite commendable progress, current methods inevitably suffer from two defects: 1) Matching ambiguity, which mainly derives from unreliable matching pairs; 2) One-sided cross-modal alignments, stemming from the absence of exploring one-to-many correspondence, i.e., coarse-grained semantic alignment. These critical issues significantly deteriorate retrieval performance. To this end, we propose a novel framework termed Adaptive Uncertainty-based Learning (AUL) for text-based person retrieval from the uncertainty perspective. Specifically, our AUL framework consists of three key components: 1) Uncertainty-aware Matching Filtration that leverages Subjective Logic to effectively mitigate the disturbance of unreliable matching pairs and select high-confidence cross-modal matches for training; 2) Uncertainty-based Alignment Refinement, which not only simulates coarse-grained alignments by constructing uncertainty representations but also performs progressive learning to incorporate coarse- and fine-grained alignments properly; 3) Cross-modal Masked Modeling that aims at exploring more comprehensive relations between vision and language. Extensive experiments demonstrate that our AUL method consistently achieves state-of-the-art performance on three benchmark datasets in supervised, weakly supervised, and domain generalization settings. Our code is available at https://github.com/CFM-MSG/Code-AUL.
引用
收藏
页码:3172 / 3180
页数:9
相关论文
共 50 条
  • [41] Improving Text-Based Person Retrieval by Excavating All-Round Information Beyond Color
    Zhu, Aichun
    Wang, Zijie
    Xue, Jingyi
    Wan, Xili
    Jin, Jing
    Wang, Tian
    Snoussi, Hichem
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (03) : 5097 - 5111
  • [42] An Empirical Study of CLIP for Text-Based Person Search
    Cao, Min
    Bai, Yang
    Zeng, Ziyin
    Ye, Mang
    Zhang, Min
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 465 - 473
  • [43] Enhancing visual representation for text-based person searching
    Shen, Wei
    Fang, Ming
    Wang, Yuxia
    Xiao, Jiafeng
    Li, Diping
    Chen, Huangqun
    Xu, Ling
    Zhang, Weifeng
    KNOWLEDGE-BASED SYSTEMS, 2025, 309
  • [44] Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold
    Wang, Zijie
    Zhu, Aichun
    Xue, Jingyi
    Wan, Xili
    Liu, Chao
    Wang, Tian
    Li, Yifeng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1984 - 1992
  • [45] ULDC: uncertainty-based learning for deep clustering
    Chang, Luyao
    Niu, Xinzheng
    Li, Zhenghua
    Zhang, Zhiheng
    Li, Shenshen
    Fournier-Viger, Philippe
    APPLIED INTELLIGENCE, 2025, 55 (03)
  • [46] Image Sense Classification in Text-Based Image Retrieval
    Chang, Yih-Chen
    Chen, Hsin-Hsi
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 124 - 135
  • [47] External query reformulation for text-based image retrieval
    Min, Jinming
    Jones, Gareth J. F.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, 7024 LNCS : 249 - 260
  • [48] Uncertainty-Based Selective Clustering for Active Learning
    Hwang, Sekjin
    Choi, Jinwoo
    Choi, Joonsoo
    IEEE ACCESS, 2022, 10 : 110983 - 110991
  • [49] Exploration Based Language Learning for Text-Based Games
    Madotto, Andrea
    Namazifar, Mahdi
    Huizinga, Joost
    Molino, Piero
    Ecoffet, Adrien
    Zheng, Huaixiu
    Yu, Dian
    Papangelis, Alexandros
    Khatri, Chandra
    Tur, Gokhan
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1488 - 1494
  • [50] External Query Reformulation for Text-Based Image Retrieval
    Min, Jinming
    Jones, Gareth J. F.
    STRING PROCESSING AND INFORMATION RETRIEVAL, 2011, 7024 : 249 - 260