PersonMAE: Person Re-Identification Pre-Training With Masked AutoEncoders

被引:0
|
作者
Hu, Hezhen [1 ]
Dong, Xiaoyi [1 ]
Bao, Jianmin [2 ]
Chen, Dongdong [3 ]
Yuan, Lu
Chen, Dong [2 ]
Li, Houqiang [1 ]
机构
[1] Univ Sci & Technol China, Hefei 230027, Peoples R China
[2] Microsoft Res Asia, Beijing 100080, Peoples R China
[3] Microsoft, Redmond, WA 98052 USA
关键词
Task analysis; Pedestrians; Semantics; Robustness; Decoding; Visualization; Image color analysis; High-quality ReID representation; masked autoencoder; pre-training; NETWORK; GAN;
D O I
10.1109/TMM.2024.3405649
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID). We argue that a high-quality ReID representation should have three properties, namely, multi-level awareness, occlusion robustness, and cross-region invariance. To this end, we propose a simple yet effective pre-training framework, namely PersonMAE, which involves two core designs into masked autoencoders to better serve the task of Person Re-ID. 1) PersonMAE generates two regions from the given image with RegionA as the input and RegionB as the prediction target. RegionA is corrupted with block-wise masking to mimic common occlusion in ReID and its remaining visible parts are fed into the encoder. 2) Then PersonMAE aims to predict the whole RegionB at both pixel level and semantic feature level. It encourages its pre-trained feature representations with the three properties mentioned above. These properties make PersonMAE compatible with downstream Person ReID tasks, leading to State-of-the-Art performance on four downstream ReID tasks, i.e., supervised (holistic and occluded setting), and unsupervised (UDA and USL setting). Notably, on the commonly adopted supervised setting, PersonMAE with ViT-B backbone achieves 79.8% and 69.5% mAP on the MSMT17 and OccDuke datasets, surpassing the previous State-of-the-Art by a large margin of +8.0 mAP, and +5.3 mAP, respectively.
引用
收藏
页码:10029 / 10040
页数:12
相关论文
共 50 条
  • [21] DCSG: data complement pseudo-label refinement and self-guided pre-training for unsupervised person re-identification
    Han, Qing
    Chen, Jiongjin
    Min, Weidong
    Li, Jiahao
    Zhan, Lixin
    Li, Longfei
    VISUAL COMPUTER, 2024, 40 (10): : 7235 - 7248
  • [22] Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training
    Chen, Zhihong
    Du, Yuhao
    Hu, Jinpeng
    Liu, Yang
    Li, Guanbin
    Wan, Xiang
    Chang, Tsung-Hui
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V, 2022, 13435 : 679 - 689
  • [23] Dual similarity pre-training and domain difference encouragement learning for vehicle re-identification in the wild
    Wang, Qi
    Zhong, Yuling
    Min, Weidong
    Zhao, Haoyu
    Gai, Di
    Han, Qing
    PATTERN RECOGNITION, 2023, 139
  • [24] Person Re-identification
    Bak, Slawomir
    Bremond, Francois
    ERCIM NEWS, 2013, (95): : 33 - 34
  • [25] Unsupervised Person Re-Identification With Stochastic Training Strategy
    Liu, Tianyang
    Lin, Yutian
    Du, Bo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4240 - 4250
  • [26] Training Person Re-identification Networks with Transferred Images
    Deng, Junkai
    Feng, Zhanxiang
    Chen, Peijia
    Lai, Jianhuang
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 366 - 378
  • [27] Online Audiovisual Signature Training for Person Re-identification
    Decroix, Francois-Xavier
    Pinquier, Julien
    Ferrane, Isabelle
    Lerasle, Frederic
    ICDSC 2016: 10TH INTERNATIONAL CONFERENCE ON DISTRIBUTED SMART CAMERA, 2016, : 62 - 68
  • [28] Multi-modal Pathological Pre-training via Masked Autoencoders for Breast Cancer Diagnosis
    Lu, Mengkang
    Wang, Tianyi
    Xia, Yong
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VI, 2023, 14225 : 457 - 466
  • [29] On Masked Pre-training and the Marginal Likelihood
    Moreno-Munoz, Pablo
    Recasens, Pol G.
    Hauberg, Soren
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [30] Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders
    Cheng, Jie
    Mei, Xiaodong
    Liu, Ming
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8645 - 8655