Random Forest Based Multiclass Classification Approach for Highly Skewed Particle Data

被引:0
|
作者
Serpil Yalcin Kuzu
机构
[1] Firat University,Department of Physics, Faculty of Science
来源
关键词
Imbalanced dataset; Multiclass classification; Random forest classifier; Resampling; Upsilon states; Weighted random forest classifier; 68T05; 68T45;
D O I
暂无
中图分类号
学科分类号
摘要
Data used in particle physics analyses have an imbalanced nature in which the events of interest are rare due to the broad background. These events can be identified from bulk by intensive computational studies including application of sophisticated analysis techniques. Classification algorithms provided by supervised machine learning (ML) approaches can be utilized to interpret skewed particle dataset as an alternative to the classic techniques even for multi particle state analysis. In this study, the ground state of the bottomonium (Υ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varUpsilon $$\end{document}(1 S)) and its excited states (Υ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varUpsilon $$\end{document}(2 S) and Υ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varUpsilon $$\end{document}(3 S)) were studied by application of multiclass classification approach based on random forest classifier (RFC) which is a novel ML approach example in particle analysis with implementation of resampling techniques for preprocessing dataset and modification of the weighting strategy. For this purpose, five widely used oversampling and two hybrid strategies, using over and under resampling together, were adjusted to RFC. Moreover, class weights applied RFC, weighted random forest (WRF), was used in the analysis. Due to the data structure, performance of the applied models was evaluated by the derivatives of confusion matrix. It is revealed that hybrid techniques implemented in RFC is suitable for handling highly imbalanced classes. G-mean and BAcc scores of upsilon states presented that with SMOTETomek strategy the model exhibited highest classification achievement, around 90%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, with high sensitivity implying the success of the application on multiclass classification.
引用
收藏
相关论文
共 50 条
  • [1] Random Forest Based Multiclass Classification Approach for Highly Skewed Particle Data
    Kuzu, Serpil Yalcin
    JOURNAL OF SCIENTIFIC COMPUTING, 2023, 95 (01)
  • [2] An analytical study of the classification of highly skewed data
    Siddiqui, Fatima
    Ali, Qazi M.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (10) : 7582 - 7601
  • [3] Classification of cattle breeds based on the random forest approach
    Kasarda, Radovan
    Moravcikova, Nina
    Meszaros, Gabor
    Simcic, Mojca
    Zaborski, Daniel
    LIVESTOCK SCIENCE, 2023, 267
  • [4] MANDARIN STOPS CLASSIFICATION BASED ON RANDOM FOREST APPROACH
    Lin, Chi-Yueh
    Wang, Hsiao-Chuan
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 241 - 244
  • [5] Bayesian approach to incremental batch learning on forest cover sensor data for multiclass classification
    Prasad, Venkata Vara D.
    Venkataramana, Lokeswari Y.
    Saraswathi, S.
    Mathew, Sarah
    Snigdha, V
    CONCURRENT ENGINEERING-RESEARCH AND APPLICATIONS, 2021, 29 (04): : 405 - 414
  • [6] Gene Selection and Classification Approach for Microarray Data based on Random Forest Ranking and BBHA
    Pashaei, Elnaz
    Ozen, Mustafa
    Aydin, Nizamettin
    2016 3RD IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, 2016, : 308 - 311
  • [7] Bias analysis in text classification for highly skewed data
    Tang, L
    Liu, H
    FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 781 - 784
  • [8] Imbalanced educational data classification: an effective approach with resampling and random forest
    Vo Thi Ngoc Chau
    Nguyen Hua Phung
    PROCEEDINGS OF 2013 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES: RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2013, : 135 - 140
  • [9] Nonlinear Random Forest Classification, a Copula-Based Approach
    Mesiar, Radko
    Sheikhi, Ayyub
    APPLIED SCIENCES-BASEL, 2021, 11 (15):
  • [10] A Density-Based Random Forest for Imbalanced Data Classification
    Dong, Jia
    Qian, Quan
    FUTURE INTERNET, 2022, 14 (03):