Predicting Breast Cancer via Supervised Machine Learning Methods on Class Imbalanced Data

被引:0
|
作者
Rajendran, Keerthana [1 ]
Jayabalan, Manoj [1 ,2 ]
Thiruchelvam, Vinesh [1 ]
机构
[1] Asia Pacific Univ Technol & Innovat, Sch Comp, Kuala Lumpur, Malaysia
[2] Liverpool John Moores Univ, Fac Engn & Technol, Liverpool, Merseyside, England
关键词
Breast cancer; class imbalance; diagnosis; bayesian network; DIAGNOSIS; MODEL; RISK; AGE;
D O I
10.14569/IJACSA.2020.0110808
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A widespread global health concern among women is the incidence of the second most leading cause of fatality which is breast cancer. Predicting the occurrence of breast cancer based on the risk factors will pave the way to an early diagnosis and an efficient treatment in a quicker time. Although there are many predictive models developed for breast cancer in the past, most of these models are generated from highly imbalanced data. The imbalanced data is usually biased towards the majority class but in cancer diagnosis, it is crucial to diagnose the patients with cancer correctly which are oftentimes the minority class. This study attempts to apply three different class balancing techniques namely oversampling (Synthetic Minority Oversampling Technique (SMOTE)), undersampling (SpreadSubsample) and a hybrid method (SMOTE and SpreadSubsample) on the Breast Cancer Surveillance Consortium (BCSC) dataset before constructing the supervised learning methods. The algorithms employed in this study include Naive Bayes, Bayesian Network, Random Forest and Decision Tree (C4.5). The balancing method which yields the best performance across all the four classifiers were tested using the validation data to determine the final predictive model. The performances of the classifiers were evaluated using a Receiver Operating Characteristic (ROC) curve, sensitivity, and specificity.
引用
收藏
页码:54 / 63
页数:10
相关论文
共 50 条
  • [21] Breast cancer prediction using supervised machine learning techniques
    Dadheech, Pankaj
    Kalmani, Vijay
    Dogiwal, Sanwta Ram
    Sharma, Vijay Kumar
    Kumar, Ankit
    Pandey, Saroj Kumar
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2023, 44 (03): : 383 - 392
  • [22] The quest for the optimal class distribution: An approach for enhancing the effectiveness of learning via resampling methods for imbalanced data sets
    Albisua I.
    Arbelaitz O.
    Gurrutxaga I.
    Lasarguren A.
    Muguerza J.
    Pérez J.M.
    Pérez, J. M. (txus.perez@ehu.es), 1600, Springer Verlag (02): : 45 - 63
  • [23] A survey of class-imbalanced semi-supervised learning
    Gui, Qian
    Zhou, Hong
    Guo, Na
    Niu, Baoning
    MACHINE LEARNING, 2024, 113 (08) : 5057 - 5086
  • [24] Comparative Analysis of Data Preprocessing Methods in Machine Learning for Breast Cancer Classification
    Stockton, Timothy
    Peddle, Brandon
    Gaulin, Angelica
    Wiechert, Emma
    Lu, Wei
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 3, AINA 2024, 2024, 201 : 268 - 279
  • [25] Predicting and Classifying Breast Cancer Using Machine Learning
    Alkhathlan, Lina
    Saudagar, Abdul Khader Jilani
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2022, 29 (06) : 497 - 514
  • [26] Predicting breast cancer risk using personal health data and machine learning models
    Stark, Gigi F.
    Hart, Gregory R.
    Nartowt, Bradley J.
    Deng, Jun
    PLOS ONE, 2019, 14 (12):
  • [27] Predicting Breast Cancer Survival Rate Based on Genetic Data: A Machine Learning Approach
    Yadav, Saanya
    Hasija, Yasha
    ADVANCES IN DIGITAL HEALTH AND MEDICAL BIOENGINEERING, VOL 1, EHB-2023, 2024, 109 : 393 - 399
  • [28] An analysis method for predicting breast cancer using data science processes and machine learning
    Cordova Calle, Juan Jose
    Farez Villa, John Xavier
    Hurtado Ortiz, Remigio Ismael
    2022 IEEE INTERNATIONAL AUTUMN MEETING ON POWER, ELECTRONICS AND COMPUTING (ROPEC), 2022,
  • [29] Predicting Cervical Cancer using Machine Learning Methods
    Alsmariy, Riham
    Healy, Graham
    Abdelhafez, Hoda
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (07) : 173 - 184
  • [30] Imbalanced Seismic Event Discrimination Using Supervised Machine Learning
    Ahn, Hyeongki
    Kim, Sangkyeum
    Lee, Kyunghyun
    Choi, Ahyeong
    You, Kwanho
    SENSORS, 2022, 22 (06)