PARFormer: Transformer-Based Multi-Task Network for Pedestrian Attribute Recognition

被引:10
|
作者
Fan, Xinwen [1 ,2 ]
Zhang, Yukang [1 ,2 ]
Lu, Yang [1 ,2 ]
Wang, Hanzi [1 ,2 ]
机构
[1] Xiamen Univ, Minist Educ China, Sch Informat, Fujian Key Lab Sensing & Comp Smart City, Xiamen 361005, Peoples R China
[2] Xiamen Univ, Minist Educ China, Key Lab Multimedia Trusted Percept & Efficient Com, Xiamen 361005, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Feature extraction; Task analysis; Visualization; Multitasking; Image recognition; Training; pedestrian attribute recognition; transformer; Feature processing; viewpoint information; PERSON REIDENTIFICATION;
D O I
10.1109/TCSVT.2023.3285411
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Pedestrian attribute recognition (PAR) has received increasing attention because of its wide application in video surveillance and pedestrian analysis. Extracting robust feature representation is one of the key challenges in this task. The existing methods primarily rely on convolutional neural networks (CNNs) as the backbone network for feature extraction. However, these methods mainly focus on small discriminative regions while ignoring the global perspective. To overcome these limitations, we propose PARFormer, a pure transformer-based multi-task PAR network consisting of four modules. In the feature extraction module, we build a transformer-based strong baseline for feature extraction, which achieves competitive results on several PAR benchmarks compared with the existing CNN-based baseline methods. Since the PAR task is vulnerable to environmental factors, we enhance feature robustness in the feature processing module and propose an effective data augmentation strategy named batch random mask (BRM) block to reinforce the attentive feature learning of random patches. Furthermore, we propose a multi-attribute center loss (MACL) to augment the inter-attribute discriminability of feature representations. As viewpoints can affect some specific attributes, in the viewpoint perception module, we propose a multi-view contrastive loss (MVCL) that enables the network to exploit the viewpoint information. In the attribute recognition module, we alleviate the negative-positive imbalance problem to generate the attribute predictions. These modules interact and jointly learn a highly discriminative feature space and supervise the generation of the final features. Extensive experimental results show that the proposed PARFormer network performs well compared to the state-of-the-art methods on several public datasets, including PETA, RAP, and PA100K. Code will be released at https://github.com/xwf199/PARFormer.
引用
收藏
页码:411 / 423
页数:13
相关论文
共 50 条
  • [1] Multi-Task Collaborative Attention Network for Pedestrian Attribute Recognition
    Cao, Junliang
    Wei, Hua
    Sun, Yongli
    Zhao, Zhifeng
    Wang, Wei
    Sun, Guangze
    Wang, Gang
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [2] Deep Learning Network for Pedestrian Attribute Recognition Based on Dynamic Multi-Task Balancing
    Sun Z.
    Ye J.
    Wang T.
    Lei L.
    Lian J.
    Li Y.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2019, 31 (12): : 2144 - 2151
  • [3] PEDESTRIAN ATTRIBUTE RECOGNITION BASED ON MULTI-TASK DEEP LEARNING AND LABEL CORRELATION ANALYSIS
    Li, Zuhe
    Xue, Mengze
    Sun, Qian
    Liu, Chenyang
    Guo, Qingbing
    Wang, Fengqin
    Deng, Lujuan
    Zhang, Huanlong
    UPB Scientific Bulletin, Series C: Electrical Engineering and Computer Science, 2022, 84 (04): : 53 - 70
  • [4] PEDESTRIAN ATTRIBUTE RECOGNITION BASED ON MULTI-TASK DEEP LEARNING AND LABEL CORRELATION ANALYSIS
    Li, Zuhe
    Xue, Mengze
    Sun, Qian
    Liu, Chenyang
    Guo, Qingbing
    Wang, Fengqin
    Deng, Lujuan
    Zhang, Huanlong
    UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2022, 84 (04): : 53 - 70
  • [5] Research on Face Attribute Recognition Based on Multi-task CNN Network
    Chen, Xiaoyan
    Wang, Weiwei
    Zheng, Shuangwu
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1221 - 1224
  • [6] Pedestrian Attribute Recognition via Hierarchical Multi-task Learning and Relationship Attention
    Gao, Lian
    Huang, Di
    Guo, Yuanfang
    Wang, Yunhong
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1340 - 1348
  • [7] Multi-Task Convolutional Neural Network for Car Attribute Recognition
    Tian, Yunfei
    Zhang, Dongping
    Jing, Changxing
    Chu, Donghui
    Yang, Li
    2017 4TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2017, : 459 - 463
  • [8] Animal Attribute Recognition via Multi-task Learning Based on YOLOX A multi-task learning network based on YOLOX to realize target detection and attribute recognition at the same time
    Liao, Yiguan
    Qiu, Changzhen
    Zhang, Zhiyong
    Chen, Jiejun
    Zheng, Jiajun
    Su, Keyuan
    Li, Haoran
    Wang, Liang
    2021 THE 5TH INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING, ICVIP 2021, 2021, : 7 - 12
  • [9] Transformer-based transfer learning and multi-task learning for improving the performance of speech emotion recognition
    Park, Sunchan
    Kim, Hyung Soon
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 515 - 522
  • [10] MULTI-TASK LEARNING VIA CO-ATTENTIVE SHARING FOR PEDESTRIAN ATTRIBUTE RECOGNITION
    Zeng, Haitian
    Ai, Haizhou
    Zhuang, Zijie
    Chen, Long
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,