Handling Imbalanced Data for Real-Time Crash Prediction: Application of Boosting and Sampling Techniques

被引:16
|
作者
Ariannezhad, Amin [1 ]
Karimpour, Abolfazl [1 ]
Qin, Xiao [2 ]
Wu, Yao-Jan [1 ]
Salmani, Yasamin [3 ]
机构
[1] Univ Arizona, Dept Civil & Architectural Engn & Mech, Tucson, AZ 85721 USA
[2] Univ Wisconsin, Dept Civil & Environm Engn, Milwaukee, WI 53211 USA
[3] Bryant Univ, Coll Business, Dept Management, Project & Operat Management, Smithfield, RI 02917 USA
关键词
Real-time crash prediction; Imbalanced data; Traffic conditions; Logistic regression; Adaptive boosting; Undersampling; Random forest (RF); BAYESIAN UPDATING APPROACH; SAFETY EVALUATION; FREEWAYS; RISK; FRAMEWORK; SEVERITY; MACHINE; WEATHER; IMPACT;
D O I
10.1061/JTEPBS.0000499
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
With a growing number of intelligent transportation system sensors and the networkwide deployment of those across the nation's roadway facilities, current research and practices should concentrate on more proactive safety strategies. In recent years, real-time traffic data collected from ITS sensors have been utilized to develop crash prediction models. Real-time crash prediction models can be used to identify hazardous traffic conditions that might cause a crash. This study aims to examine how employing data mining techniques that account for imbalanced data could improve the predictive capability of real-time crash prediction models. The term imbalanced data refers to a condition where the number of observations in each class is not equally distributed among the data set (noncrash cases outnumber crash cases). To decrease the within-class variation of imbalanced data, the data were split into two traffic-state data sets: free-flow speed (FFS) and congestion. Three models, including logistic regression as the baseline, random forest (RF) with random undersampling, and Adaptive Boosting (AdaBoost), were estimated with each data set. The results were compared with the models that were estimated using the complete set of data. Model comparisons indicated that all three models achieved significantly better predictive results with the congested and FFS data sets as opposed to the data set containing all crashes and that, while in some cases the results of the undersampled RF model were slightly better than those of AdaBoost, both models outperformed the logistic regression model. The results of this study demonstrated that using models to deal with imbalanced data and lowering the variation of imbalanced data could substantially improve crash prediction accuracy. The findings could help traffic agencies to practically implement and deploy crash prediction models for real-time applications and develop crash prevention strategies accordingly.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Real-time accident detection: Coping with imbalanced data
    Parsa, Amir Bahador
    Taghipour, Homa
    Derrible, Sybil
    Mohammadian, Abolfazl
    ACCIDENT ANALYSIS AND PREVENTION, 2019, 129 : 202 - 210
  • [22] A Genetic Programming Model for Real-Time Crash Prediction on Freeways
    Xu, Chengcheng
    Wang, Wei
    Liu, Pan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2013, 14 (02) : 574 - 586
  • [23] A deep learning approach for real-time crash prediction using vehicle-by-vehicle data
    Basso, Franco
    Pezoa, Raill
    Varas, Mauricio
    Villalobos, Matias
    ACCIDENT ANALYSIS AND PREVENTION, 2021, 162
  • [24] REAL-TIME CARDIOVASCULAR DATA-SAMPLING
    WINCHESTER, BT
    CARDIOVASCULAR RESEARCH, 1972, 6 (03) : 302 - +
  • [25] Application of a Rule-Based Approach in Real-Time Crash Risk Prediction Model Development Using Loop Detector Data
    Pirdavani, Ali
    De Pauw, Ellen
    Brijs, Tom
    Daniels, Stijn
    Magis, Maarten
    Bellemans, Tom
    Wets, Geert
    TRAFFIC INJURY PREVENTION, 2015, 16 (08) : 786 - 791
  • [26] Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Napolitano, Amri
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2011, 41 (03): : 552 - 568
  • [27] Handling Imbalanced Data in Customer Churn Prediction Using Combined Sampling and Weighted Random Forest
    Effendy, Veronikha
    Adiwijaya
    Baizal, Z. K. A.
    2014 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2014,
  • [28] Real-time-crash prediction model for application to crash prevention in freeway traffic
    Lee, C
    Hellinga, B
    Saccomanno, F
    STATISTICAL METHODS AND MODELING AND SAFETY DATA, ANALYSIS, AND EVALUATION: SAFETY AND HUMAN PERFORMANCE, 2003, (1840): : 67 - 77
  • [29] Real-time Prediction of Information Search Channel Using Data Mining Techniques
    Khatwani, Gaurav
    Srivastava, Praveen Ranjan
    2015 INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT), 2015, : 924 - 929
  • [30] A real-time crash prediction model for the ramp vicinities of urban expressways
    Hossain, Moinul
    Muromachi, Yasunori
    IATSS RESEARCH, 2013, 37 (01) : 68 - 79