Self-Supervised Modality-Aware Multiple Granularity Pre-Training for RGB-Infrared Person Re-Identification

被引:5
|
作者
Wan, Lin [1 ]
Jing, Qianyan [1 ]
Sun, Zongyuan [1 ]
Zhang, Chuang [2 ]
Li, Zhihang [3 ]
Chen, Yehansen [1 ]
机构
[1] China Univ Geosci, Sch Comp Sci, Wuhan 430078, Peoples R China
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[3] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
关键词
Task analysis; Training; Feature extraction; Lighting; Cameras; Visualization; Self-supervised learning; Cross-modality person re-identification; self-supervised learning; multi-modality pre-training;
D O I
10.1109/TIFS.2023.3273911
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
RGB-Infrared person re-identification (RGB-IR ReID) aims to associate people across disjoint RGB and IR camera views. Currently, state-of-the-art performance of RGB-IR ReID is not as impressive as that of conventional ReID. Much of that is due to the notorious modality bias training issue brought by the single-modality ImageNet pre-training, which might yield RGB-biased representations that severely hinder the cross-modality image retrieval. This paper makes first attempt to tackle the task from a pre-training perspective. We propose a self-supervised pre-training solution, named Modality-Aware Multiple Granularity Learning (MMGL), which directly trains models from scratch only on multi-modal ReID datasets, but achieving competitive results against ImageNet pre-training, without using any external data or sophisticated tuning tricks. First, we develop a simple-but-effective 'permutation recovery' pretext task that globally maps shuffled RGB-IR images into a shared latent permutation space, providing modality-invariant global representations for downstream ReID tasks. Second, we present a part-aware cycle-contrastive (PCC) learning strategy that utilizes cross-modality cycle-consistency to maximize agreement between semantically similar RGB-IR image patches. This enables contrastive learning for the unpaired multi-modal scenarios, further improving the discriminability of local features without laborious instance augmentation. Based on these designs, MMGL effectively alleviates the modality bias training problem. Extensive experiments demonstrate that it learns better representations (+8.03% Rank-1 accuracy) with faster training speed (converge only in few hours) and higher data efficiency (< 5% data size) than ImageNet pre-training. The results also suggest it generalizes well to various existing models, losses and has promising transferability across datasets. The code will be released at https://github.com/hansonchen1996/MMGL.
引用
收藏
页码:3044 / 3057
页数:14
相关论文
共 50 条
  • [41] Multi-Scale Cascading Network with Compact Feature Learning for RGB-Infrared Person Re-Identification
    Zhang, Can
    Liu, Hong
    Guo, Wei
    Ye, Mang
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8679 - 8686
  • [42] MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification
    Gao, Yajun
    Liang, Tengfei
    Jin, Yi
    Gu, Xiaoyan
    Liu, Wu
    Li, Yidong
    Lang, Congyan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5257 - 5265
  • [43] Large-Scale Pre-training for Person Re-identification with Noisy Labels
    Fu, Dengpan
    Chen, Dongdong
    Yang, Hao
    Bao, Jianmin
    Yuan, Lu
    Zhang, Lei
    Li, Houqiang
    Wen, Fang
    Chen, Dong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2466 - 2476
  • [44] Synthesizing efficient data with diffusion models for person re-identification pre-training
    Niu, Ke
    Yu, Haiyang
    Qian, Xuelin
    Fu, Teng
    Li, Bin
    Xue, Xiangyang
    MACHINE LEARNING, 2025, 114 (03)
  • [45] Self-Supervised Consistency Based on Joint Learning for Unsupervised Person Re-identification
    Lou, Xulei
    Wu, Tinghui
    Hu, Haifeng
    Chen, Dihu
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (01)
  • [46] Self-Supervised Recovery and Guide for Low-Resolution Person Re-Identification
    Han, Ke
    Huang, Yan
    Wang, Liang
    Liu, Zikun
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 6252 - 6263
  • [47] Alleviating Modality Bias Training for Infrared-Visible Person Re-Identification
    Huang, Yan
    Wu, Qiang
    Xu, Jingsong
    Zhong, Yi
    Zhang, Peng
    Zhang, Zhaoxiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1570 - 1582
  • [48] Cross-Modal Cross-Domain Dual Alignment Network for RGB-Infrared Person Re-Identification
    Fu, Xiaowei
    Huang, Fuxiang
    Zhou, Yuhang
    Ma, Huimin
    Xu, Xin
    Zhang, Lei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6874 - 6887
  • [49] Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification
    Shao, Zhiyin
    Zhang, Xinyu
    Ding, Changxing
    Wang, Jian
    Wang, Jingdong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11140 - 11150
  • [50] Attribute-Identity Embedding and Self-Supervised Learning for Scalable Person Re-Identification
    Li, Huafeng
    Yan, Shuanglin
    Yu, Zhengtao
    Tao, Dapeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (10) : 3472 - 3485