PersonMAE: Person Re-Identification Pre-Training With Masked AutoEncoders

被引：0

作者：

Hu, Hezhen ^{[1
]}

Dong, Xiaoyi ^{[1
]}

Bao, Jianmin ^{[2
]}

Chen, Dongdong ^{[3
]}

Yuan, Lu

Chen, Dong ^{[2
]}

Li, Houqiang ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei 230027, Peoples R China

[2] Microsoft Res Asia, Beijing 100080, Peoples R China

[3] Microsoft, Redmond, WA 98052 USA

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Task analysis; Pedestrians; Semantics; Robustness; Decoding; Visualization; Image color analysis; High-quality ReID representation; masked autoencoder; pre-training; NETWORK; GAN;

D O I：

10.1109/TMM.2024.3405649

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID). We argue that a high-quality ReID representation should have three properties, namely, multi-level awareness, occlusion robustness, and cross-region invariance. To this end, we propose a simple yet effective pre-training framework, namely PersonMAE, which involves two core designs into masked autoencoders to better serve the task of Person Re-ID. 1) PersonMAE generates two regions from the given image with RegionA as the input and RegionB as the prediction target. RegionA is corrupted with block-wise masking to mimic common occlusion in ReID and its remaining visible parts are fed into the encoder. 2) Then PersonMAE aims to predict the whole RegionB at both pixel level and semantic feature level. It encourages its pre-trained feature representations with the three properties mentioned above. These properties make PersonMAE compatible with downstream Person ReID tasks, leading to State-of-the-Art performance on four downstream ReID tasks, i.e., supervised (holistic and occluded setting), and unsupervised (UDA and USL setting). Notably, on the commonly adopted supervised setting, PersonMAE with ViT-B backbone achieves 79.8% and 69.5% mAP on the MSMT17 and OccDuke datasets, surpassing the previous State-of-the-Art by a large margin of +8.0 mAP, and +5.3 mAP, respectively.

引用

页码：10029 / 10040

页数：12

共 50 条

[31] Masked Attribute Description Embedding for Cloth-Changing Person Re-Identification
Peng, Chunlei
Wang, Boyu
Liu, Decheng
Wang, Nannan
Hu, Ruimin
Gao, Xinbo
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1475 - 1485
[32] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Tong, Zhan
Song, Yibing
Wang, Jue
Wang, Limin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[33] Self directed training of person re-identification with synthetic data
Dant, Aaron P.
Kacenjar, Steve T.
Neely, Ronald
APPLICATIONS OF MACHINE LEARNING 2021, 2021, 11843
[34] Deep Person Re-Identification with Improved Embedding and Efficient Training
Jin, Haibo
Wang, Xiaobo
Liao, Shengcai
Li, Stan Z.
2017 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB), 2017, : 261 - 267
[35] Learning fused features with parallel training for person re-identification
Li, Xuan
Zhang, Tao
Zhao, Xin
Sun, Xing
Yi, Zhengming
KNOWLEDGE-BASED SYSTEMS, 2021, 220
[36] Re-identification methods for masked microdata
Winkler, WE
PRIVACY IN STATISTICAL DATABASES, PROCEEDINGS, 2004, 3050 : 216 - 230
[37] Person Re-identification in the Wild
Zheng, Liang
Zhang, Hengheng
Sun, Shaoyan
Chandraker, Manmohan
Yang, Yi
Tian, Qi
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3346 - 3355
[38] OCCLUDED PERSON RE-IDENTIFICATION
Zhuo, Jiaxuan
Chen, Zeyu
Lai, Jianhuang
Wang, Guangcong
2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
[39] Person Re-identification by Attributes
Layne, Ryan
Hospedales, Timothy
Gong, Shaogang
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
[40] Person re-identification in crowd
Mazzon, Riccardo
Tahir, Syed Fahad
Cavallaro, Andrea
PATTERN RECOGNITION LETTERS, 2012, 33 (14) : 1828 - 1837

← 1 2 3 4 5 →