Multi-Modal Pedestrian Crossing Intention Prediction with Transformer-Based Model

被引：0

作者：

Wang, Ting-Wei ^{[1
]}

Lai, Shang-Hong ^{[1
]}

机构：

[1] Natl Tsing Hua Univ, Hsinchu, Taiwan

来源：

APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING | 2024年 / 13卷 / 05期

关键词：

Pedestrian crossing intention prediction; multi-modal learning; transformer model; human posture;

D O I：

10.1561/116.20240019

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Pedestrian crossing intention prediction based on computer vision plays a pivotal role in enhancing the safety of autonomous driving and advanced driver assistance systems. In this paper, we present a novel multi-modal pedestrian crossing intention prediction framework leveraging the transformer model. By integrating diverse sources of information and leveraging the transformer's sequential modeling and parallelization capabilities, our system accurately predicts pedestrian crossing intentions. We introduce a novel representation of traffic environment data and incorporate lifted 3D human pose and head orientation data to enhance the model's understanding of pedestrian behavior. Experimental results demonstrate the state-of-the-art accuracy of our proposed system on benchmark datasets.

引用

页数：29

共 50 条

[21] TRANSFORMER-BASED MULTI-MODAL LEARNING FOR MULTI-LABEL REMOTE SENSING IMAGE CLASSIFICATION
Hoffmann, David Sebastian
Clasen, Kai Norman
Demir, Begum
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4891 - 4894
[22] Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection
Ju, Xincheng
Zhang, Dong
Li, Junhui
Zhou, Guodong
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 512 - 520
[23] Pedestrian Crossing Intention Prediction Method Based on Multi-Feature Fusion
Ma, Jun
Rong, Wenhui
WORLD ELECTRIC VEHICLE JOURNAL, 2022, 13 (08):
[24] Multi information pedestrian crossing intention prediction based on mixed attention mechanism
Sang, Hai-Feng
Liu, Yu-Long
Liu, Quan-Kai
Kongzhi yu Juece/Control and Decision, 2024, 39 (12): : 3946 - 3954
[25] Multi-Modal Hybrid Architecture for Pedestrian Action Prediction
Rasouli, Amir
Yau, Tiffany
Rohani, Mohsen
Luo, Jun
2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, : 91 - 97
[26] Representation, Alignment, Fusion: A Generic Transformer-Based Framework for Multi-modal Glaucoma Recognition
Zhou, You
Yang, Gang
Zhou, Yang
Ding, Dayong
Zhao, Jianchun
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 704 - 713
[27] A Transformer-based multi-modal fusion network for 6D pose estimation
Hong, Jia-Xin
Zhang, Hong-Bo
Liu, Jing-Hua
Lei, Qing
Yang, Li-Jie
Du, Ji-Xiang
INFORMATION FUSION, 2024, 105
[28] Pedestrian Detection Based on Multi-modal Cooperation
Zhang, Yan-ning
Tong, Xiao-min
Zhang, Xiu-wei
Zheng, Jiang-bin
Zhou, Jun
You, Si-wei
2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 151 - +
[29] Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer
Zhang, Zhihao
Chen, Yiwei
Zhang, Weizhan
Yan, Caixia
Zheng, Qinghua
Wang, Qi
Chen, Wangdu
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3560 - 3568
[30] Multi-modal Intention Prediction with Probabilistic Movement Primitives
Dermy, Oriane
Charpillet, Francois
Ivaldi, Serena
HUMAN FRIENDLY ROBOTICS, 2019, 7 : 181 - 196

← 1 2 3 4 5 →