Pedestrian Attribute Recognition Based on Multimodal Transformer

被引：0

作者：

Liu, Dan ^{[1
]}

Song, Wei ^{[1
,2
,3
]}

Zhao, Xiaobing ^{[1
,3
]}

机构：

[1] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China

[2] Minzu Univ China, Key Lab Ethn Language Intelligent Anal & Secur Go, MOE, Beijing 100081, Peoples R China

[3] Minzu Univ China, Natl Lauguage Resource Monitoring & Res Ctr Minor, Beijing 100081, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I | 2024年 / 14425卷

关键词：

Pedestrian Attribute Recognition; Multimodal Learning; Transformer;

D O I：

10.1007/978-981-99-8429-9_34

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pedestrian attribute recognition (PAR) is susceptible to variable shooting angles, lighting, and occlusions. Improving recognition accuracy to suit its application in various complex scenarios is one of the most important tasks. In this paper, based on the Image-Text Multimodal Transformer, the intra-modal and inter-modal correlations are learned from pedestrian images and attribute labels. The applicability of six different multimodal fusion frameworks for attribute recognition is explored. The impact of different frameworks' fused feature division methods on recognition accuracy is compared and analyzed. The comparative experiments verify the robustness and efficiency of the Early Concatenate framework, which has achieved multiple best metric scores on the two major public PAR datasets, PA100k and RAP. This paper not only proposes a new Transformer-based high-accuracy multimodal network, but also provides feasible ideas and directions for further research on PAR. The comparative discussion based on various multimodal frame-works also provides a perspective that can be learned for other multimodal tasks.

引用

页码：422 / 433

页数：12

共 50 条

[21] HIERARCHICAL PEDESTRIAN ATTRIBUTE RECOGNITION BASED ON ADAPTIVE REGION LOCALIZATION
Yao, Chunfeng
Feng, Bailan
Li, Defeng
Li, Jian
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2017,
[22] Pedestrian Attribute Recognition At Far Distance
Deng, Yubin
Luo, Ping
Loy, Chen Change
Tang, Xiaoou
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 789 - 792
[23] Multimodal Transformer Network for Pedestrian Trajectory Prediction
Yin, Ziyi
Liu, Ruijin
Xiong, Zhiliang
Yuan, Zejian
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1259 - 1265
[24] Learning Disentangled Attribute Representations for Robust Pedestrian Attribute Recognition
Jia, Jian
Gao, Naiyu
He, Fei
Chen, Xiaotang
Huang, Kaiqi
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1069 - 1077
[25] Attribute correlation mask fusion network for pedestrian attribute recognition
Li, Baoan
Zhang, Long
Teng, Shangzhi
Lyu, Xueqiang
VISUAL COMPUTER, 2024, : 3719 - 3734
[26] TRANSFORMER BASED MULTIMODAL SCENE RECOGNITION IN SOCCER VIDEOS
Gan, Yaozong
Togo, Ren
Ogawa, Takahiro
Haseyama, Miki
2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
[27] Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization
Chen, Wei-Chen
Yu, Xin-Yi
Ou, Lin-Lin
MACHINE INTELLIGENCE RESEARCH, 2022, 19 (02) : 153 - 168
[28] Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization
Wei-Chen Chen
Xin-Yi Yu
Lin-Lin Ou
Machine Intelligence Research, 2022, 19 : 153 - 168
[29] Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization
Wei-Chen Chen
Xin-Yi Yu
Lin-Lin Ou
Machine Intelligence Research, 2022, 19 (02) : 153 - 168
[30] Skeleton-Based Attention Mask for Pedestrian Attribute Recognition Network
Sooksatra, Sorn
Rujikietgumjorn, Sitapa
JOURNAL OF IMAGING, 2021, 7 (12)

← 1 2 3 4 5 →