Random Forest Based Multiclass Classification Approach for Highly Skewed Particle Data

被引:0
|
作者
Serpil Yalcin Kuzu
机构
[1] Firat University,Department of Physics, Faculty of Science
来源
关键词
Imbalanced dataset; Multiclass classification; Random forest classifier; Resampling; Upsilon states; Weighted random forest classifier; 68T05; 68T45;
D O I
暂无
中图分类号
学科分类号
摘要
Data used in particle physics analyses have an imbalanced nature in which the events of interest are rare due to the broad background. These events can be identified from bulk by intensive computational studies including application of sophisticated analysis techniques. Classification algorithms provided by supervised machine learning (ML) approaches can be utilized to interpret skewed particle dataset as an alternative to the classic techniques even for multi particle state analysis. In this study, the ground state of the bottomonium (Υ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varUpsilon $$\end{document}(1 S)) and its excited states (Υ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varUpsilon $$\end{document}(2 S) and Υ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varUpsilon $$\end{document}(3 S)) were studied by application of multiclass classification approach based on random forest classifier (RFC) which is a novel ML approach example in particle analysis with implementation of resampling techniques for preprocessing dataset and modification of the weighting strategy. For this purpose, five widely used oversampling and two hybrid strategies, using over and under resampling together, were adjusted to RFC. Moreover, class weights applied RFC, weighted random forest (WRF), was used in the analysis. Due to the data structure, performance of the applied models was evaluated by the derivatives of confusion matrix. It is revealed that hybrid techniques implemented in RFC is suitable for handling highly imbalanced classes. G-mean and BAcc scores of upsilon states presented that with SMOTETomek strategy the model exhibited highest classification achievement, around 90%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, with high sensitivity implying the success of the application on multiclass classification.
引用
收藏
相关论文
共 50 条
  • [21] A Technique for Spatial Data Classification Using Random Forest based Correlation
    Sheena Smart, P. D.
    Thanammal, K. K.
    Sujatha, S. S.
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2022, 13 (01): : 16 - 27
  • [22] Imbalanced data classification based on DB-SLSMOTE and random forest
    Han, Qi
    Yang, Rui
    Wan, Zitong
    Chen, Shaozhi
    Huang, Mengjie
    Wen, Huiqing
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 6271 - 6276
  • [23] UNBALANCED BIG DATA CLASSIFICATION BASED ON IMPROVED RANDOM FOREST ALGORITHM
    Zheng, Xin
    Huang, Li
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2024, 20 (02): : 575 - 590
  • [24] Prediction of Clinical Disease with AI-Based Multiclass Classification Using Naive Bayes and Random Forest Classifier
    Jackins, V
    Vimal, S.
    Kaliappan, M.
    Lee, Mi Young
    ADVANCES IN ARTIFICIAL INTELLIGENCE AND APPLIED COGNITIVE COMPUTING, 2021, : 841 - 849
  • [25] Random forest algorithm for classification of multiwavelength data
    Gao, Dan
    Zhang, Yan-Xia
    Zhao, Yong-Heng
    RESEARCH IN ASTRONOMY AND ASTROPHYSICS, 2009, 9 (02) : 220 - 226
  • [26] Random forest algorithm for classification of multiwavelength data
    Dan Gao1
    2 Graduate University of Chinese Academy of Sciences
    ResearchinAstronomyandAstrophysics, 2009, 9 (02) : 220 - 226
  • [27] Deep Learning-Based, Multiclass Approach to Cancer Classification on Liquid Biopsy Data
    Jopek, Maksym A.
    Pastuszak, Krzysztof
    Cygert, Sebastian
    Best, Myron G.
    Wurdinger, Thomas
    Jassem, Jacek
    Zaczek, Anna J.
    Supernat, Anna
    IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE, 2024, 12 : 306 - 313
  • [28] Illuminant Classification based on Random Forest
    Liu, Bozhi
    Qiu, Guoping
    2015 14TH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2015, : 106 - 109
  • [29] Merge Loss Calculation Method for Highly Imbalanced Data Multiclass Classification
    Du, Zehua
    Zhang, Hao
    Wei, Zhiqiang
    Zhu, Yuanyuan
    Xu, Jiali
    Huang, Xianqing
    Yin, Bo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (12) : 1 - 14
  • [30] Particle Swarm Optimization-Based Random Forest Framework for the Classification of Chronic Diseases
    Singh, Akansha
    Prakash, Nupur
    Jain, Anurag
    IEEE ACCESS, 2023, 11 : 133931 - 133946