Cost-sensitive learning for semi-supervised hit-and-run analysis

被引:14
|
作者
Zhu, Siying [1 ]
Wan, Jianwu [1 ]
机构
[1] Nanyang Technol Univ, Sch Civil & Environm Engn, Singapore, Singapore
来源
关键词
Hit-and-run; Cost-sensitive; Semi-supervised learning; Imbalanced dataset; Unlabelled data; CRASHES; ACCIDENTS; VEHICLE; BARRIERS; NETWORK; MODEL; ROAD;
D O I
10.1016/j.aap.2021.106199
中图分类号
TB18 [人体工程学];
学科分类号
1201 ;
摘要
Hit-and-run crashes not only degrade the morality, but also result in delays of medical services provided to victims. However, class imbalance problem exists as the number of hit-and-run crashes is much smaller than that of non-hit-and-run crashes. The missing label problem also exists in the crash analysis due to reasons like data barrier such that the information hidden in the unlabelled samples has not been effectively utilised. In this paper, a cost-sensitive semi-supervised logistic regression (CS3LR) model is proposed for hit-and-run analysis, in order to tackle class-imbalanced data distribution and missing label problem, based on the crash dataset of Victorian, Australia (2013-2019). By performing label estimation with logistic regression jointly utilising both labelled and unlabelled data with pseudo labels in a well-designed cost-sensitive semi-supervised maximum likelihood framework, the proposed model can obtain an unbiased likelihood parameter for hit-and-run prediction and analysis. Comparing the experimental results of CS3LR model with two logistic regression models and seven machine learning methods, better performance of CS3LR model is demonstrated. The most significant contributing factors to hit-and-run crashes extracted by CS3LR with only 10% labelled data show a high degree of consistency with the true contributing factors obtained by the supervised cost-sensitive logistic regression with complete hit-and-run labels. The effects of class-weighted ratio and hyper-parameter lambda on the performance of hitand-run crash prediction model have also been analysed. The results can further provide recommendations and implications on the policies and counter-measures for preventing hit-and-run collisions and crimes. The methodology proposed in this paper can also be employed to analyse crash data with other types of missing labels, such as crash severity.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] PRIVILEGED SEMI-SUPERVISED LEARNING
    Chen, Xingyu
    Gong, Chen
    Ma, Chao
    Huang, Xiaolin
    Yang, Jie
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2999 - 3003
  • [42] Introduction to semi-supervised learning
    Goldberg, Xiaojin
    Synthesis Lectures on Artificial Intelligence and Machine Learning, 2009, 6 : 1 - 116
  • [43] On Semi-Supervised Learning and Sparsity
    Balinsky, Alexander
    Balinsky, Helen
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 3083 - +
  • [44] A survey on semi-supervised learning
    Van Engelen, Jesper E.
    Hoos, Holger H.
    MACHINE LEARNING, 2020, 109 (02) : 373 - 440
  • [45] Semi-supervised learning with trees
    Kemp, C
    Griffiths, TL
    Stromsten, S
    Tenenbaum, JB
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 257 - 264
  • [46] Human Semi-Supervised Learning
    Gibson, Bryan R.
    Rogers, Timothy T.
    Zhu, Xiaojin
    TOPICS IN COGNITIVE SCIENCE, 2013, 5 (01) : 132 - 172
  • [47] Semi-supervised distribution learning
    Wen, Mengtao
    Jia, Yinxu
    Ren, Haojie
    Wang, Zhaojun
    Zou, Changliang
    BIOMETRIKA, 2024, 112 (01)
  • [48] Universal Semi-Supervised Learning
    Huang, Zhuo
    Xue, Chao
    Han, Bo
    Yang, Jian
    Gong, Chen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [49] Active Cost-Sensitive Learning
    Margineantu, Dragos D.
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 1622 - 1623
  • [50] Analysis of imbalanced data using cost-sensitive learning
    Kim, Sojin
    Song, Jongwoo
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2025,