Handling Imbalanced Data for Real-Time Crash Prediction: Application of Boosting and Sampling Techniques

被引:16
|
作者
Ariannezhad, Amin [1 ]
Karimpour, Abolfazl [1 ]
Qin, Xiao [2 ]
Wu, Yao-Jan [1 ]
Salmani, Yasamin [3 ]
机构
[1] Univ Arizona, Dept Civil & Architectural Engn & Mech, Tucson, AZ 85721 USA
[2] Univ Wisconsin, Dept Civil & Environm Engn, Milwaukee, WI 53211 USA
[3] Bryant Univ, Coll Business, Dept Management, Project & Operat Management, Smithfield, RI 02917 USA
关键词
Real-time crash prediction; Imbalanced data; Traffic conditions; Logistic regression; Adaptive boosting; Undersampling; Random forest (RF); BAYESIAN UPDATING APPROACH; SAFETY EVALUATION; FREEWAYS; RISK; FRAMEWORK; SEVERITY; MACHINE; WEATHER; IMPACT;
D O I
10.1061/JTEPBS.0000499
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
With a growing number of intelligent transportation system sensors and the networkwide deployment of those across the nation's roadway facilities, current research and practices should concentrate on more proactive safety strategies. In recent years, real-time traffic data collected from ITS sensors have been utilized to develop crash prediction models. Real-time crash prediction models can be used to identify hazardous traffic conditions that might cause a crash. This study aims to examine how employing data mining techniques that account for imbalanced data could improve the predictive capability of real-time crash prediction models. The term imbalanced data refers to a condition where the number of observations in each class is not equally distributed among the data set (noncrash cases outnumber crash cases). To decrease the within-class variation of imbalanced data, the data were split into two traffic-state data sets: free-flow speed (FFS) and congestion. Three models, including logistic regression as the baseline, random forest (RF) with random undersampling, and Adaptive Boosting (AdaBoost), were estimated with each data set. The results were compared with the models that were estimated using the complete set of data. Model comparisons indicated that all three models achieved significantly better predictive results with the congested and FFS data sets as opposed to the data set containing all crashes and that, while in some cases the results of the undersampled RF model were slightly better than those of AdaBoost, both models outperformed the logistic regression model. The results of this study demonstrated that using models to deal with imbalanced data and lowering the variation of imbalanced data could substantially improve crash prediction accuracy. The findings could help traffic agencies to practically implement and deploy crash prediction models for real-time applications and develop crash prevention strategies accordingly.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Influence of Different Sampling Techniques on The Real-time Crash Risk Prediction Model
    Yin, Yuhan
    Huang, Yulin
    Zhang, Linxiao
    Gao, Zhen
    PROCEEDINGS OF THE 2019 14TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2019), 2019, : 1795 - 1799
  • [2] Real-time crash prediction on freeways using data mining and emerging techniques
    You J.
    Wang J.
    Guo J.
    Journal of Modern Transportation, 2017, 25 (2): : 116 - 123
  • [3] Real-time crash prediction on freeways using data mining and emerging techniques
    Jinming You
    Junhua Wang
    Jingqiu Guo
    Journal of Modern Transportation, 2017, (02) : 116 - 123
  • [4] Wasserstein Generative Adversarial Network to Address the Imbalanced Data Problem in Real-Time Crash Risk Prediction
    Man, Cheuk Ki
    Quddus, Mohammed
    Theofilatos, Athanasios
    Yu, Rongjie
    Imprialou, Marianna
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 23002 - 23013
  • [5] Class-imbalanced crash prediction based on real-time traffic and weather data: A driving simulator study
    Elamrani Abou Elassad, Zouhair
    Mousannif, Hajar
    Al Moatassime, Hassan
    TRAFFIC INJURY PREVENTION, 2020, 21 (03) : 201 - 208
  • [6] PCA-based missing information imputation for real-time crash likelihood prediction under imbalanced data
    Ke, Jintao
    Zhang, Shuaichao
    Yang, Hai
    Chen, Xiqun
    TRANSPORTMETRICA A-TRANSPORT SCIENCE, 2018, 15 (02) : 872 - 895
  • [7] OUBoost: boosting based over and under sampling technique for handling imbalanced data
    Mostafaei, Sahar Hassanzadeh
    Tanha, Jafar
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (10) : 3393 - 3411
  • [8] OUBoost: boosting based over and under sampling technique for handling imbalanced data
    Sahar Hassanzadeh Mostafaei
    Jafar Tanha
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3393 - 3411
  • [9] Handling Imbalanced Data in Road Crash Severity Prediction by Machine Learning Algorithms
    Fiorentini, Nicholas
    Losa, Massimo
    INFRASTRUCTURES, 2020, 5 (07)
  • [10] Enhancing Crash Injury Severity Prediction on Imbalanced Crash Data by Sampling Technique with Variable Selection
    Yahaya, Mahama
    Jiang, Xinguo
    Fu, Chuanyun
    Bashir, Kamal
    Fan, Wenbo
    2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 363 - 368