Multi-Modal Pedestrian Crossing Intention Prediction with Transformer-Based Model

被引:0
|
作者
Wang, Ting-Wei [1 ]
Lai, Shang-Hong [1 ]
机构
[1] Natl Tsing Hua Univ, Hsinchu, Taiwan
关键词
Pedestrian crossing intention prediction; multi-modal learning; transformer model; human posture;
D O I
10.1561/116.20240019
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Pedestrian crossing intention prediction based on computer vision plays a pivotal role in enhancing the safety of autonomous driving and advanced driver assistance systems. In this paper, we present a novel multi-modal pedestrian crossing intention prediction framework leveraging the transformer model. By integrating diverse sources of information and leveraging the transformer's sequential modeling and parallelization capabilities, our system accurately predicts pedestrian crossing intentions. We introduce a novel representation of traffic environment data and incorporate lifted 3D human pose and head orientation data to enhance the model's understanding of pedestrian behavior. Experimental results demonstrate the state-of-the-art accuracy of our proposed system on benchmark datasets.
引用
收藏
页数:29
相关论文
共 50 条
  • [31] MULTI-VIEW AND MULTI-MODAL EVENT DETECTION UTILIZING TRANSFORMER-BASED MULTI-SENSOR FUSION
    Yasuda, Masahiro
    Ohishi, Yasunori
    Saito, Shoichiro
    Harado, Noboru
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4638 - 4642
  • [32] Multi-modal pedestrian detection with misalignment based on modal-wise regression and multi-modal IoU
    Wanchaitanawong, Napat
    Tanaka, Masayuki
    Shibata, Takashi
    Okutomi, Masatoshi
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (01)
  • [33] Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction
    Bae, Inhwan
    Park, Jin-Hwi
    Jeon, Hae-Gon
    COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 270 - 289
  • [34] Multi-modal long document classification based on Hierarchical Prompt and Multi-modal Transformer
    Liu, Tengfei
    Hu, Yongli
    Gao, Junbin
    Wang, Jiapu
    Sun, Yanfeng
    Yin, Baocai
    NEURAL NETWORKS, 2024, 176
  • [35] SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings
    Nhat Truong Pham
    Duc Ngoc Minh Dang
    Bich Ngoc Hong Pham
    Sy Dzung Nguyen
    PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2023, 2023, : 234 - 238
  • [36] FNR: a similarity and transformer-based approach to detect multi-modal fake news in social media
    Ghorbanpour, Faeze
    Ramezani, Maryam
    Fazli, Mohammad Amin
    Rabiee, Hamid R.
    SOCIAL NETWORK ANALYSIS AND MINING, 2023, 13 (01)
  • [37] FNR: a similarity and transformer-based approach to detect multi-modal fake news in social media
    Faeze Ghorbanpour
    Maryam Ramezani
    Mohammad Amin Fazli
    Hamid R. Rabiee
    Social Network Analysis and Mining, 13
  • [38] An intention-based multi-modal trajectory prediction framework for overtaking maneuver
    Zhang, Mingfang
    Liu, Ying
    Li, Huajian
    Wang, Li
    Wang, Pangwei
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 2850 - 2855
  • [39] UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection
    Guo, Ruohao
    Ying, Xianghua
    Qi, Yanyu
    Qu, Liao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7622 - 7635
  • [40] Dual-attention transformer-based hybrid network for multi-modal medical image segmentation
    Zhang, Menghui
    Zhang, Yuchen
    Liu, Shuaibing
    Han, Yahui
    Cao, Honggang
    Qiao, Bingbing
    SCIENTIFIC REPORTS, 2024, 14 (01):