Multi-Modal Pedestrian Crossing Intention Prediction with Transformer-Based Model

被引:0
|
作者
Wang, Ting-Wei [1 ]
Lai, Shang-Hong [1 ]
机构
[1] Natl Tsing Hua Univ, Hsinchu, Taiwan
关键词
Pedestrian crossing intention prediction; multi-modal learning; transformer model; human posture;
D O I
10.1561/116.20240019
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Pedestrian crossing intention prediction based on computer vision plays a pivotal role in enhancing the safety of autonomous driving and advanced driver assistance systems. In this paper, we present a novel multi-modal pedestrian crossing intention prediction framework leveraging the transformer model. By integrating diverse sources of information and leveraging the transformer's sequential modeling and parallelization capabilities, our system accurately predicts pedestrian crossing intentions. We introduce a novel representation of traffic environment data and incorporate lifted 3D human pose and head orientation data to enhance the model's understanding of pedestrian behavior. Experimental results demonstrate the state-of-the-art accuracy of our proposed system on benchmark datasets.
引用
收藏
页数:29
相关论文
共 50 条
  • [21] TRANSFORMER-BASED MULTI-MODAL LEARNING FOR MULTI-LABEL REMOTE SENSING IMAGE CLASSIFICATION
    Hoffmann, David Sebastian
    Clasen, Kai Norman
    Demir, Begum
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4891 - 4894
  • [22] Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection
    Ju, Xincheng
    Zhang, Dong
    Li, Junhui
    Zhou, Guodong
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 512 - 520
  • [23] Pedestrian Crossing Intention Prediction Method Based on Multi-Feature Fusion
    Ma, Jun
    Rong, Wenhui
    WORLD ELECTRIC VEHICLE JOURNAL, 2022, 13 (08):
  • [24] Multi information pedestrian crossing intention prediction based on mixed attention mechanism
    Sang, Hai-Feng
    Liu, Yu-Long
    Liu, Quan-Kai
    Kongzhi yu Juece/Control and Decision, 2024, 39 (12): : 3946 - 3954
  • [25] Multi-Modal Hybrid Architecture for Pedestrian Action Prediction
    Rasouli, Amir
    Yau, Tiffany
    Rohani, Mohsen
    Luo, Jun
    2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, : 91 - 97
  • [26] Representation, Alignment, Fusion: A Generic Transformer-Based Framework for Multi-modal Glaucoma Recognition
    Zhou, You
    Yang, Gang
    Zhou, Yang
    Ding, Dayong
    Zhao, Jianchun
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 704 - 713
  • [27] A Transformer-based multi-modal fusion network for 6D pose estimation
    Hong, Jia-Xin
    Zhang, Hong-Bo
    Liu, Jing-Hua
    Lei, Qing
    Yang, Li-Jie
    Du, Ji-Xiang
    INFORMATION FUSION, 2024, 105
  • [28] Pedestrian Detection Based on Multi-modal Cooperation
    Zhang, Yan-ning
    Tong, Xiao-min
    Zhang, Xiu-wei
    Zheng, Jiang-bin
    Zhou, Jun
    You, Si-wei
    2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 151 - +
  • [29] Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer
    Zhang, Zhihao
    Chen, Yiwei
    Zhang, Weizhan
    Yan, Caixia
    Zheng, Qinghua
    Wang, Qi
    Chen, Wangdu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3560 - 3568
  • [30] Multi-modal Intention Prediction with Probabilistic Movement Primitives
    Dermy, Oriane
    Charpillet, Francois
    Ivaldi, Serena
    HUMAN FRIENDLY ROBOTICS, 2019, 7 : 181 - 196