Taxi drivers' traffic violations detection using random forest algorithm: A case study in China

被引:4
|
作者
Wan, Ming [1 ]
Wu, Qian [1 ]
Yan, Lixin [1 ]
Guo, Junhua [1 ]
Li, Wenxia [1 ]
Lin, Wei [2 ]
Lu, Shan [3 ]
机构
[1] East China Jiaotong Univ, Sch Transportat Engn, 808 Shuanggang East St,Nanchang Econ Dev Zone, Nanchang 330013, Jiangxi, Peoples R China
[2] Traff Adm Bur Nanchang Publ Secur Bur, Nanchang, Jiangxi, Peoples R China
[3] Shenzhen Polytech, Inst Intelligence Sci & Engn, Shenzhen, Peoples R China
关键词
Taxi drivers' traffic violations; impact factors; imbalanced dataset; Random Forest; SHAP; SAFETY; CLASSIFICATION; EXPERIENCE; TIME;
D O I
10.1080/15389588.2023.2191286
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Objective: To effectively explore the impacts of several key factors on taxi drivers' traffic violations and provide traffic management departments with scientific decisions to reduce traffic fatalities and injuries. Methods: 43,458 electronic enforcement data about taxi drivers' traffic violations in Nanchang City, Jiangxi Province, China, from July 1, 2020, to June 30, 2021, were utilized to explore the characteristics of traffic violations. A random forest algorithm was used to predict the severity of taxi drivers' traffic violations and 11 factors affecting traffic violations, including time, road conditions, environment, and taxi companies were analyzed using the Shapley Additionality Explanation (SHAP) framework. Results: Firstly, the ensemble method Balanced Bagging Classifier (BBC) was applied to balance the dataset. The results showed that the imbalance ratio (IR) of the original imbalanced dataset reduced from 6.61% to 2.60%. Moreover, a prediction model for the severity of taxi drivers' traffic violations was established by using the Random Forest, and the results showed that accuracy, m_F1, m_G-mean, m_AUC, and m_AP obtained 0.877, 0.849, 0.599, 0.976, and 0.957, respectively. Compared with the algorithms of Decision Tree, XG Boost, Ada Boost, and Neural Network, the performance measures of the prediction model based on Random Forest were the best. Finally, the SHAP framework was used to improve the interpretability of the model and identify important factors affecting taxi drivers' traffic violations. The results showed that functional districts, location of the violation, and road grade were found to have a high impact on the probability of traffic violations; their mean SHAP values were 0.39, 0.36, and 0.26, respectively. Conclusions: Findings of this paper may help to discover the relationship between the influencing factors and the severity of traffic violations, and provide a theoretical basis for reducing the traffic violations of taxi drivers and improving the road safety management.
引用
收藏
页码:362 / 370
页数:9
相关论文
共 50 条
  • [41] A Novel Long Term Traffic Forecast Algorithm and Case Study for China
    Wang, Shoufeng
    Zhang, Dongchen
    Xu, Xiaoyan
    Liang, Tong
    Li, Xingzheng
    Yao, Wenwen
    2013 15TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2013, : 578 - 583
  • [42] A Novel Long Term Traffic Forecast Algorithm and Case Study for China
    Zhang Dongchen
    Wang Shoufeng
    Xu Xiaoyan
    Li Xingzheng
    Yao Wenwen
    Wang Tinglan
    2014 28TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (WAINA), 2014, : 425 - 430
  • [43] Reducing traffic violations in the online food delivery industry-A case study in Xi'an City, China
    Lu, Xin-wei
    Guo, Xiao-lu
    Zhang, Jing-xiao
    Li, Xiao-bing
    Li, Li
    Jones, Steven
    FRONTIERS IN PUBLIC HEALTH, 2022, 10
  • [44] Mapping Landslide Hazard Risk Using Random Forest Algorithm in Guixi, Jiangxi, China
    Zhang, Yang
    Wu, Weicheng
    Qin, Yaozu
    Lin, Ziyu
    Zhang, Guiliang
    Chen, Renxiang
    Song, Yong
    Lang, Tao
    Zhou, Xiaoting
    Huangfu, Wenchao
    Ou, Penghui
    Xie, Lifeng
    Huang, Xiaolan
    Peng, Shanling
    Shao, Chongjian
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2020, 9 (11)
  • [45] Prediction of aboveground grassland biomass on the Loess Plateau, China, using a random forest algorithm
    Yinyin Wang
    Gaolin Wu
    Lei Deng
    Zhuangsheng Tang
    Kaibo Wang
    Wenyi Sun
    Zhouping Shangguan
    Scientific Reports, 7
  • [46] A Corrosion Detection Algorithm Via The Random Forest Model
    Liu Tingting
    Kang Kai
    Zhang Fen
    Ni Jialiang
    Wang Tianyun
    17TH INTERNATIONAL CONFERENCE ON OPTICAL COMMUNICATIONS AND NETWORKS (ICOCN2018), 2019, 11048
  • [47] Prediction of aboveground grassland biomass on the Loess Plateau, China, using a random forest algorithm
    Wang, Yinyin
    Wu, Gaolin
    Deng, Lei
    Tang, Zhuangsheng
    Wang, Kaibo
    Sun, Wenyi
    Shangguan, Zhouping
    SCIENTIFIC REPORTS, 2017, 7
  • [48] A Random Forest Incident Detection Algorithm that Incorporates Contexts
    Evans, Jonny
    Waterson, Ben
    Hamilton, Andrew
    INTERNATIONAL JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS RESEARCH, 2020, 18 (02) : 230 - 242
  • [49] A Random Forest Incident Detection Algorithm that Incorporates Contexts
    Jonny Evans
    Ben Waterson
    Andrew Hamilton
    International Journal of Intelligent Transportation Systems Research, 2020, 18 : 230 - 242
  • [50] Motorcycle safety among motorcycle taxi drivers and nonoccupational motorcyclists in developing countries: A case study of Maoming, South China
    Wu, Connor Y. H.
    Loo, Becky P. Y.
    TRAFFIC INJURY PREVENTION, 2016, 17 (02) : 170 - 175