Taxi drivers' traffic violations detection using random forest algorithm: A case study in China

被引:4
|
作者
Wan, Ming [1 ]
Wu, Qian [1 ]
Yan, Lixin [1 ]
Guo, Junhua [1 ]
Li, Wenxia [1 ]
Lin, Wei [2 ]
Lu, Shan [3 ]
机构
[1] East China Jiaotong Univ, Sch Transportat Engn, 808 Shuanggang East St,Nanchang Econ Dev Zone, Nanchang 330013, Jiangxi, Peoples R China
[2] Traff Adm Bur Nanchang Publ Secur Bur, Nanchang, Jiangxi, Peoples R China
[3] Shenzhen Polytech, Inst Intelligence Sci & Engn, Shenzhen, Peoples R China
关键词
Taxi drivers' traffic violations; impact factors; imbalanced dataset; Random Forest; SHAP; SAFETY; CLASSIFICATION; EXPERIENCE; TIME;
D O I
10.1080/15389588.2023.2191286
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Objective: To effectively explore the impacts of several key factors on taxi drivers' traffic violations and provide traffic management departments with scientific decisions to reduce traffic fatalities and injuries. Methods: 43,458 electronic enforcement data about taxi drivers' traffic violations in Nanchang City, Jiangxi Province, China, from July 1, 2020, to June 30, 2021, were utilized to explore the characteristics of traffic violations. A random forest algorithm was used to predict the severity of taxi drivers' traffic violations and 11 factors affecting traffic violations, including time, road conditions, environment, and taxi companies were analyzed using the Shapley Additionality Explanation (SHAP) framework. Results: Firstly, the ensemble method Balanced Bagging Classifier (BBC) was applied to balance the dataset. The results showed that the imbalance ratio (IR) of the original imbalanced dataset reduced from 6.61% to 2.60%. Moreover, a prediction model for the severity of taxi drivers' traffic violations was established by using the Random Forest, and the results showed that accuracy, m_F1, m_G-mean, m_AUC, and m_AP obtained 0.877, 0.849, 0.599, 0.976, and 0.957, respectively. Compared with the algorithms of Decision Tree, XG Boost, Ada Boost, and Neural Network, the performance measures of the prediction model based on Random Forest were the best. Finally, the SHAP framework was used to improve the interpretability of the model and identify important factors affecting taxi drivers' traffic violations. The results showed that functional districts, location of the violation, and road grade were found to have a high impact on the probability of traffic violations; their mean SHAP values were 0.39, 0.36, and 0.26, respectively. Conclusions: Findings of this paper may help to discover the relationship between the influencing factors and the severity of traffic violations, and provide a theoretical basis for reducing the traffic violations of taxi drivers and improving the road safety management.
引用
收藏
页码:362 / 370
页数:9
相关论文
共 50 条
  • [31] Investigating Effects of Temporal and Locational Factors on Traffic Violations of Taxi Drivers: Data from Off-Site Enforcement Camera System
    Liu, Yan
    Liu, Haiyue
    Zhou, Yue
    Fu, Chuanyun
    Zhu, Quan
    CICTP 2019: TRANSPORTATION IN CHINA-CONNECTING THE WORLD, 2019, : 356 - 366
  • [32] USING MATHEMATICS AS A TOOL IN RWANDAN WORKPLACE SETTINGS: THE CASE OF TAXI DRIVERS
    Gahamanyi, Marcel
    Andersson, Ingrid
    Bergsten, Christer
    CERME 6 - PROCEEDINGS OF THE 6TH CONGRESS OF THE EUROPEAN SOCIETY FOR RESEARCH IN MATHEMATICS EDUCATION, 2010, : 1484 - 1493
  • [33] Detection and Attribution of Alpine Inland Lake Changes by Using Random Forest Algorithm
    Guo, Wei
    Ni, Xiangnan
    Mu, Yi
    Liu, Tong
    Zhang, Junzhe
    REMOTE SENSING, 2023, 15 (04)
  • [34] Fast Defect Detection Algorithm on the Variety Surface with Random Forest using GPUs
    Kwon, Bae-guen
    Kang, Dong-joong
    2011 11TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2011, : 1135 - 1136
  • [35] APPLICATION OF RANDOM FOREST ALGORITHM TO SENTINEL-1 FOR PLANTATION DETECTION: CASE STUDY OF TESSO NILO ECOSYSTEM
    Ghivarry, Giusti
    Sukmawijaya, Adhera
    SEVENTH GEOINFORMATION SCIENCE SYMPOSIUM 2021, 2021, 12082
  • [36] DETECTION OF INVARIANT VEGETATION AREAS IN TIME SERIES USING RANDOM FOREST ALGORITHM
    Lacerda, Eduardo Ribeiro
    Vicens, Raul Sanchez
    GEOGRAPHIA-UFF, 2021, 23 (50):
  • [37] Investigating health issues of motorcycle taxi drivers: A case study of Vietnam
    Truong, Long T.
    Tay, Richard
    Nguyen, Hang T. T.
    JOURNAL OF TRANSPORT & HEALTH, 2021, 20
  • [38] Measuring urban poverty using multi -source data and a random forest algorithm: A case study in Guangzhou
    Niu, Tong
    Chen, Yimin
    Yuan, Yuan
    SUSTAINABLE CITIES AND SOCIETY, 2020, 54
  • [39] Emotional states of drivers and the impact on speed, acceleration and traffic violations-A simulator study
    Roidl, Ernst
    Frehse, Berit
    Hoeger, Rainer
    ACCIDENT ANALYSIS AND PREVENTION, 2014, 70 : 282 - 292
  • [40] Network Traffic Clustering Using Random Forest Proximities
    Wang, Yu
    Xiang, Yang
    Zhang, Jun
    2013 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2013,