Pedestrian Attribute Recognition Based on Multimodal Transformer

被引：0

作者：

Liu, Dan ^{[1
]}

Song, Wei ^{[1
,2
,3
]}

Zhao, Xiaobing ^{[1
,3
]}

机构：

[1] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China

[2] Minzu Univ China, Key Lab Ethn Language Intelligent Anal & Secur Go, MOE, Beijing 100081, Peoples R China

[3] Minzu Univ China, Natl Lauguage Resource Monitoring & Res Ctr Minor, Beijing 100081, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I | 2024年 / 14425卷

关键词：

Pedestrian Attribute Recognition; Multimodal Learning; Transformer;

D O I：

10.1007/978-981-99-8429-9_34

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pedestrian attribute recognition (PAR) is susceptible to variable shooting angles, lighting, and occlusions. Improving recognition accuracy to suit its application in various complex scenarios is one of the most important tasks. In this paper, based on the Image-Text Multimodal Transformer, the intra-modal and inter-modal correlations are learned from pedestrian images and attribute labels. The applicability of six different multimodal fusion frameworks for attribute recognition is explored. The impact of different frameworks' fused feature division methods on recognition accuracy is compared and analyzed. The comparative experiments verify the robustness and efficiency of the Early Concatenate framework, which has achieved multiple best metric scores on the two major public PAR datasets, PA100k and RAP. This paper not only proposes a new Transformer-based high-accuracy multimodal network, but also provides feasible ideas and directions for further research on PAR. The comparative discussion based on various multimodal frame-works also provides a perspective that can be learned for other multimodal tasks.

引用

页码：422 / 433

页数：12

共 50 条

[41] MRG-T: Mask-Relation-Guided Transformer for Remote Vision-Based Pedestrian Attribute Recognition in Aerial Imagery
Zhang, Shun
Li, Yupeng
Wu, Xiao
Chu, Zunheng
Li, Lingfei
REMOTE SENSING, 2024, 16 (07)
[42] TransGait: Multimodal-based gait recognition with set transformer
Guodong Li
Lijun Guo
Rong Zhang
Jiangbo Qian
Shangce Gao
Applied Intelligence, 2023, 53 : 1535 - 1547
[43] TransGait: Multimodal-based gait recognition with set transformer
Li, Guodong
Guo, Lijun
Zhang, Rong
Qian, Jiangbo
Gao, Shangce
APPLIED INTELLIGENCE, 2023, 53 (02) : 1535 - 1547
[44] Husformer: A Multimodal Transformer for Multimodal Human State Recognition
Wang, Ruiqi
Jo, Wonse
Zhao, Dezhong
Wang, Weizheng
Gupte, Arjun
Yang, Baijian
Chen, Guohua
Min, Byung-Cheol
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (04) : 1374 - 1390
[45] Explicit State Representation Guided Video-based Pedestrian Attribute Recognition
Lu, Wei-Qing
Hu, Hai-Miao
Yu, Jinzuo
Zhang, Shifeng
Wang, Hanzi
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (01)
[46] STDP-Net: Improved Pedestrian Attribute Recognition Using Swin Transformer and Semantic Self-Attention
Lee, Geonu
Cho, Jungchan
IEEE ACCESS, 2022, 10 : 82656 - 82667
[47] Orientation-Aware Pedestrian Attribute Recognition Based on Graph Convolution Network
Lu, Wei-Qing
Hu, Hai-Miao
Yu, Jinzuo
Zhou, Yibo
Wang, Hanzi
Li, Bo
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 28 - 40
[48] Pedestrian Attribute Recognition Model based on Adaptive Weight and Depthwise Separable Convolutions
Lin, Xiao
PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 830 - 833
[49] Pedestrian Attribute Recognition with Part-based CNN and Combined Feature Representations
Chen, Yiqiang
Duffner, Stefan
Stoian, Andrei
Dufour, Jean-Yves
Baskurt, Atilla
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2018), VOL 5: VISAPP, 2018, : 114 - 122
[50] Overview of deep learning based pedestrian attribute recognition and re-identification
Wu, Duidi
Huang, Haiqing
Zhao, Qianyou
Zhang, Shuo
Qi, Jin
Hu, Jie
HELIYON, 2022, 8 (12)

← 1 2 3 4 5 →