PersonMAE: Person Re-Identification Pre-Training With Masked AutoEncoders

被引:0
|
作者
Hu, Hezhen [1 ]
Dong, Xiaoyi [1 ]
Bao, Jianmin [2 ]
Chen, Dongdong [3 ]
Yuan, Lu
Chen, Dong [2 ]
Li, Houqiang [1 ]
机构
[1] Univ Sci & Technol China, Hefei 230027, Peoples R China
[2] Microsoft Res Asia, Beijing 100080, Peoples R China
[3] Microsoft, Redmond, WA 98052 USA
关键词
Task analysis; Pedestrians; Semantics; Robustness; Decoding; Visualization; Image color analysis; High-quality ReID representation; masked autoencoder; pre-training; NETWORK; GAN;
D O I
10.1109/TMM.2024.3405649
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID). We argue that a high-quality ReID representation should have three properties, namely, multi-level awareness, occlusion robustness, and cross-region invariance. To this end, we propose a simple yet effective pre-training framework, namely PersonMAE, which involves two core designs into masked autoencoders to better serve the task of Person Re-ID. 1) PersonMAE generates two regions from the given image with RegionA as the input and RegionB as the prediction target. RegionA is corrupted with block-wise masking to mimic common occlusion in ReID and its remaining visible parts are fed into the encoder. 2) Then PersonMAE aims to predict the whole RegionB at both pixel level and semantic feature level. It encourages its pre-trained feature representations with the three properties mentioned above. These properties make PersonMAE compatible with downstream Person ReID tasks, leading to State-of-the-Art performance on four downstream ReID tasks, i.e., supervised (holistic and occluded setting), and unsupervised (UDA and USL setting). Notably, on the commonly adopted supervised setting, PersonMAE with ViT-B backbone achieves 79.8% and 69.5% mAP on the MSMT17 and OccDuke datasets, surpassing the previous State-of-the-Art by a large margin of +8.0 mAP, and +5.3 mAP, respectively.
引用
收藏
页码:10029 / 10040
页数:12
相关论文
共 50 条
  • [1] Unsupervised Pre-training for Person Re-identification
    Fu, Dengpan
    Chen, Dongdong
    Bao, Jianmin
    Yang, Hao
    Yuan, Lu
    Zhang, Lei
    Li, Houqiang
    Chen, Dong
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14745 - 14754
  • [2] Large-Scale Pre-training for Person Re-identification with Noisy Labels
    Fu, Dengpan
    Chen, Dongdong
    Yang, Hao
    Bao, Jianmin
    Yuan, Lu
    Zhang, Lei
    Li, Houqiang
    Wen, Fang
    Chen, Dong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2466 - 2476
  • [3] Synthesizing efficient data with diffusion models for person re-identification pre-training
    Niu, Ke
    Yu, Haiyang
    Qian, Xuelin
    Fu, Teng
    Li, Bin
    Xue, Xiangyang
    MACHINE LEARNING, 2025, 114 (03)
  • [4] Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification
    Shao, Zhiyin
    Zhang, Xinyu
    Ding, Changxing
    Wang, Jian
    Wang, Jingdong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11140 - 11150
  • [5] Unified pre-training with pseudo infrared images for visible-infrared person re-identification
    ZhiGang Liu
    Yan Hu
    Multimedia Tools and Applications, 2024, 83 (38) : 86039 - 86058
  • [6] Self-supervised Pre-training with Learnable Tokenizers for Person Re-Identification in Railway Stations
    Yang, Enze
    Li, Chao
    Liu, Shuoyan
    Liu, Yuxin
    Zhao, Shitao
    Huang, Nan
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 325 - 330
  • [7] Unleashing Potential of Unsupervised Pre-Training with Intra-Identity Regularization for Person Re-Identification
    Yang, Zizheng
    Jin, Xin
    Zheng, Kecheng
    Zhao, Feng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14278 - 14287
  • [8] PASS: Part-Aware Self-Supervised Pre-Training for Person Re-Identification
    Zhu, Kuan
    Guo, Haiyun
    Yan, Tianyi
    Zhu, Yousong
    Wang, Jinqiao
    Tang, Ming
    COMPUTER VISION - ECCV 2022, PT XIV, 2022, 13674 : 198 - 214
  • [9] Efficient Image Pre-training with Siamese Cropped Masked Autoencoders
    Eymael, Alexandre
    Vandeghen, Renaud
    Cioppa, Anthony
    Giancola, Silvio
    Ghanem, Bernard
    Van Droogenbroeck, Marc
    COMPUTER VISION - ECCV 2024, PT XXIII, 2025, 15081 : 348 - 366
  • [10] Contrastive learning-based joint pre-training for unsupervised domain adaptive person re-identification
    Wang, Jing
    Li, Xiaohong
    Dai, Xuesong
    Zhuang, Shuo
    Qi, Meibin
    MULTIMEDIA SYSTEMS, 2025, 31 (02)