Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval

被引:0
|
作者
Li, Shenshen [1 ,2 ]
He, Chen [1 ,2 ]
Xu, Xing [1 ,2 ]
Shen, Fumin [1 ,2 ]
Yang, Yang [1 ,2 ]
Shen, Heng Tao [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text- based person retrieval aims at retrieving a specific pedestrian image from a gallery based on textual descriptions. The primary challenge is how to overcome the inherent heterogeneous modality gap in the situation of significant intra-class variation and minimal inter-class variation. Existing approaches commonly employ vision-language pre-training or attention mechanisms to learn appropriate crossmodal alignments from noise inputs. Despite commendable progress, current methods inevitably suffer from two defects: 1) Matching ambiguity, which mainly derives from unreliable matching pairs; 2) One-sided cross-modal alignments, stemming from the absence of exploring one-to-many correspondence, i.e., coarse-grained semantic alignment. These critical issues significantly deteriorate retrieval performance. To this end, we propose a novel framework termed Adaptive Uncertainty-based Learning (AUL) for text-based person retrieval from the uncertainty perspective. Specifically, our AUL framework consists of three key components: 1) Uncertainty-aware Matching Filtration that leverages Subjective Logic to effectively mitigate the disturbance of unreliable matching pairs and select high-confidence cross-modal matches for training; 2) Uncertainty-based Alignment Refinement, which not only simulates coarse-grained alignments by constructing uncertainty representations but also performs progressive learning to incorporate coarse- and fine-grained alignments properly; 3) Cross-modal Masked Modeling that aims at exploring more comprehensive relations between vision and language. Extensive experiments demonstrate that our AUL method consistently achieves state-of-the-art performance on three benchmark datasets in supervised, weakly supervised, and domain generalization settings. Our code is available at https://github.com/CFM-MSG/Code-AUL.
引用
收藏
页码:3172 / 3180
页数:9
相关论文
共 50 条
  • [21] EESSO: Exploiting Extreme and Smooth Signals via Omni-frequency learning for Text-based Person Retrieval
    Xue, Jingyi
    Wang, Zijie
    Dong, Guan-Nan
    Zhu, Aichun
    IMAGE AND VISION COMPUTING, 2024, 142
  • [22] Text-based experiment retrieval in genomic databases
    Sener, Duygu Dede
    Ogul, Hasan
    Basak, Selen
    JOURNAL OF INFORMATION SCIENCE, 2024, 50 (05) : 1334 - 1344
  • [23] EFFECTS OF CENTRALITY ON RETRIEVAL OF TEXT-BASED CONCEPTS
    ALBRECHT, JE
    OBRIEN, EJ
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 1991, 17 (05) : 932 - 939
  • [24] Uncertainty-based modulation for lifelong learning
    Brna, Andrew P.
    Brown, Ryan C.
    Connolly, Patrick M.
    Simons, Stephen B.
    Shimizu, Renee E.
    Aguilar-Simon, Mario
    NEURAL NETWORKS, 2019, 120 : 129 - 142
  • [25] A Scene Text-Based Image Retrieval System
    Thuy Ho
    Ngoc Ly
    2012 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2012, : 79 - 84
  • [26] Linguistic Hallucination for Text-Based Video Retrieval
    Fang, Sheng
    Dang, Tiantian
    Wang, Shuhui
    Huang, Qingming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9692 - 9705
  • [27] Text-Based Face Retrieval: Methods and Challenges
    Deng, Yuchuan
    Zhao, Qijun
    Hu, Zhanpeng
    Xu, Zixiang
    BIOMETRIC RECOGNITION, CCBR 2023, 2023, 14463 : 150 - 159
  • [28] Improving embedding learning by virtual attribute decoupling for text-based person search
    Wang, Chengji
    Luo, Zhiming
    Lin, Yaojin
    Li, Shaozi
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (07): : 5625 - 5647
  • [29] Contrastive Transformer Learning With Proximity Data Generation for Text-Based Person Search
    Wu, Hefeng
    Chen, Weifeng
    Liu, Zhibin
    Chen, Tianshui
    Chen, Zhiguang
    Lin, Liang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7005 - 7016
  • [30] RaSa: Relation and Sensitivity Aware Representation Learning for Text-based Person Search
    Bai, Yang
    Cao, Min
    Gao, Daming
    Cao, Ziqiang
    Chen, Chen
    Fan, Zhenfeng
    Nie, Liqiang
    Zhang, Min
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 555 - 563