Pedestrian Attribute Recognition Based on Multimodal Transformer

被引：0

作者：

Liu, Dan ^{[1
]}

Song, Wei ^{[1
,2
,3
]}

Zhao, Xiaobing ^{[1
,3
]}

机构：

[1] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China

[2] Minzu Univ China, Key Lab Ethn Language Intelligent Anal & Secur Go, MOE, Beijing 100081, Peoples R China

[3] Minzu Univ China, Natl Lauguage Resource Monitoring & Res Ctr Minor, Beijing 100081, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I | 2024年 / 14425卷

关键词：

Pedestrian Attribute Recognition; Multimodal Learning; Transformer;

D O I：

10.1007/978-981-99-8429-9_34

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pedestrian attribute recognition (PAR) is susceptible to variable shooting angles, lighting, and occlusions. Improving recognition accuracy to suit its application in various complex scenarios is one of the most important tasks. In this paper, based on the Image-Text Multimodal Transformer, the intra-modal and inter-modal correlations are learned from pedestrian images and attribute labels. The applicability of six different multimodal fusion frameworks for attribute recognition is explored. The impact of different frameworks' fused feature division methods on recognition accuracy is compared and analyzed. The comparative experiments verify the robustness and efficiency of the Early Concatenate framework, which has achieved multiple best metric scores on the two major public PAR datasets, PA100k and RAP. This paper not only proposes a new Transformer-based high-accuracy multimodal network, but also provides feasible ideas and directions for further research on PAR. The comparative discussion based on various multimodal frame-works also provides a perspective that can be learned for other multimodal tasks.

引用

页码：422 / 433

页数：12

共 50 条

[1] ALFormer: Attribute Localization Transformer in Pedestrian Attribute Recognition
Liu, Yuxin
Wang, Mingzhe
Li, Chao
Liu, Shuoyan
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2024, 21 (04) : 1567 - 1582
[2] Disentangled Attribute Features Vision Transformer for Pedestrian Attribute Recognition
Liu, Caihua
Guo, Jiaxian
Chen, Sichu
Feng, Xia
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 497 - 509
[3] Diverse features discovery transformer for pedestrian attribute recognition
Zheng, Aihua
Wang, Huimin
Wang, Jiaxiang
Huang, Huaibo
He, Ran
Hussain, Amir
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 119
[4] Vision Transformer With Relation Exploration for Pedestrian Attribute Recognition
Tan, Hao
Tan, Zichang
Weng, Dunfang
Liu, Ajian
Wan, Jun
Lei, Zhen
Li, Stan Z.
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 198 - 208
[5] Pedestrian attribute recognition based on attribute correlation
Zhao, Ruijie
Lang, Congyan
Li, Zun
Liang, Liqian
Wei, Lili
Feng, Songhe
Wang, Tao
MULTIMEDIA SYSTEMS, 2022, 28 (03) : 1069 - 1081
[6] Pedestrian attribute recognition based on attribute correlation
Ruijie Zhao
Congyan Lang
Zun Li
Liqian Liang
Lili Wei
Songhe Feng
Tao Wang
Multimedia Systems, 2022, 28 : 1069 - 1081
[7] PARFormer: Transformer-Based Multi-Task Network for Pedestrian Attribute Recognition
Fan, Xinwen
Zhang, Yukang
Lu, Yang
Wang, Hanzi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 411 - 423
[8] DRFormer: Learning dual relations using Transformer for pedestrian attribute recognition
Tang, Zengming
Huang, Jun
NEUROCOMPUTING, 2022, 497 : 159 - 169
[9] Pedestrian Attribute Recognition Based on Deep Learning
Yuan Peipei
Zhang Liang
LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (06)
[10] DEEP PEDESTRIAN ATTRIBUTE RECOGNITION BASED ON LSTM
Ji, Zhong
Zheng, Weixiong
Pang, Yanwei
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 151 - 155

← 1 2 3 4 5 →