Predicting Breast Cancer via Supervised Machine Learning Methods on Class Imbalanced Data

被引:0
|
作者
Rajendran, Keerthana [1 ]
Jayabalan, Manoj [1 ,2 ]
Thiruchelvam, Vinesh [1 ]
机构
[1] Asia Pacific Univ Technol & Innovat, Sch Comp, Kuala Lumpur, Malaysia
[2] Liverpool John Moores Univ, Fac Engn & Technol, Liverpool, Merseyside, England
关键词
Breast cancer; class imbalance; diagnosis; bayesian network; DIAGNOSIS; MODEL; RISK; AGE;
D O I
10.14569/IJACSA.2020.0110808
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A widespread global health concern among women is the incidence of the second most leading cause of fatality which is breast cancer. Predicting the occurrence of breast cancer based on the risk factors will pave the way to an early diagnosis and an efficient treatment in a quicker time. Although there are many predictive models developed for breast cancer in the past, most of these models are generated from highly imbalanced data. The imbalanced data is usually biased towards the majority class but in cancer diagnosis, it is crucial to diagnose the patients with cancer correctly which are oftentimes the minority class. This study attempts to apply three different class balancing techniques namely oversampling (Synthetic Minority Oversampling Technique (SMOTE)), undersampling (SpreadSubsample) and a hybrid method (SMOTE and SpreadSubsample) on the Breast Cancer Surveillance Consortium (BCSC) dataset before constructing the supervised learning methods. The algorithms employed in this study include Naive Bayes, Bayesian Network, Random Forest and Decision Tree (C4.5). The balancing method which yields the best performance across all the four classifiers were tested using the validation data to determine the final predictive model. The performances of the classifiers were evaluated using a Receiver Operating Characteristic (ROC) curve, sensitivity, and specificity.
引用
收藏
页码:54 / 63
页数:10
相关论文
共 50 条
  • [41] Effective data-balancing methods for class-imbalanced genotoxicity datasets using machine learning algorithms and molecular fingerprints
    Bae, Su-Yong
    Lee, Jonga
    Jeong, Jaeseong
    Lim, Changwon
    Choi, Jinhee
    COMPUTATIONAL TOXICOLOGY, 2021, 20
  • [42] A semi-supervised resampling method for class-imbalanced learning
    Jiang, Zhen
    Zhao, Lingyun
    Lu, Yu
    Zhan, Yongzhao
    Mao, Qirong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 221
  • [43] Class-Imbalanced Semi-Supervised Learning with Adaptive Thresholding
    Guo, Lan-Zhe
    Li, Yu-Feng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [44] Supervised Class Distribution Learning for GANs-based Imbalanced Classification
    Cai, Zixin
    Wang, Xinyue
    Zhou, Mingjie
    Xu, Jian
    Jing, Liping
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 41 - 50
  • [45] Semi-supervised Learning for Instrument Detection with a Class Imbalanced Dataset
    Yoon, Jihun
    Lee, Jiwon
    Park, SungHyun
    Hyung, Woo Jin
    Choi, Min-Kook
    INTERPRETABLE AND ANNOTATION-EFFICIENT LEARNING FOR MEDICAL IMAGE COMPUTING, IMIMIC 2020, MIL3ID 2020, LABELS 2020, 2020, 12446 : 266 - 276
  • [46] Predicting doxorubicin-induced cardiotoxicity in breast cancer: leveraging machine learning with synthetic data
    Araujo, Daniella Castro
    Simoes, Ricardo
    Sabino, Adriano de Paula
    Oliveira, Angelica Navarro de
    de Oliveira, Camila Maciel
    Veloso, Adriano Alonso
    Gomes, Karina Braga
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2025,
  • [47] Predicting breast cancer metastasis by using serum biomarkers and clinicopathological data with machine learning technologies
    Tseng, Yi-Ju
    Huang, Chuan-En
    Wen, Chiao-Ni
    Lai, Po-Yin
    Wu, Min-Hsien
    Sun, Yu-Chen
    Wang, Hsin-Yao
    Lu, Jang-Jih
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 128 : 79 - 86
  • [48] Predicting the risk of cancer in adults using supervised machine learning: a scoping review
    Alfayez, Asma Abdullah
    Kunz, Holger
    Lai, Alvina Grace
    BMJ OPEN, 2021, 11 (09):
  • [49] Twice Class Bias Correction for Imbalanced Semi-supervised Learning
    Li, Lan
    Tao, Bowen
    Han, Lu
    Zhan, De-chuan
    Ye, Han-jia
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13563 - 13571
  • [50] Predicting student success with and without library instruction using supervised machine learning methods
    Harker, Karen
    Hargis, Carol
    Rowe, Jennifer
    PERFORMANCE MEASUREMENT AND METRICS, 2024,