Feature Selection and Comparative Analysis of Breast Cancer Prediction Using Clinical Data and Histopathological Whole Slide Images

被引:0
|
作者
Mohammed, Sarfaraz Ahmed [1 ]
Abeysinghe, Senuka [2 ]
Ralescu, Anca [1 ]
机构
[1] Univ Cincinnati, Dept Comp Sci, Cincinnati, OH 45221 USA
[2] Indian Hill High Sch, Ohios Coll, Credit Plus Program, Cincinnati, OH 45243 USA
关键词
Breast cancer; Machine learning; Principal component analysis; Particle swarm optimization; Feature selection; Logistic regression; Na & iuml; ve bayes classification; k-NN; Support vector machines; Random forest; K-Means; Whole slide images; TCGA; Histopathology; Deep learning; Digital image analysis; Convolutional neural network; H&E-stained images; Nuclei segmentation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Breast Carcinoma is a common cancer among women, with invasive ductal carcinoma and lobular carcinoma being the two most frequent types. Early detection is critical to prevent cancer from becoming malignant. Diagnostic tests include mammogram, ultrasound, MRI, or biopsy. Machine Learning algorithms can play a key role in analyzing complex clinical datasets to predict disease outcomes. This study uses machine learning and deep learning techniques to analyze publicly available clinical and medical image data. For clinical data, Principal Component Analysis (PCA) and Particle Swarm Optimization (PSO) are applied on the Wisconsin Breast Cancer dataset (WDBC) for feature selection and evaluate the performance of each modality in distinguishing between benign and malignant tumors. The results obtained show that the Random Forest (RF) classifier outperforms other classification algorithms using both PSO and PCA feature selections, achieving predictive accuracies of 95.7% and 97.2% respectively. The first part of the paper contains a comprehensive analysis of the two feature selection methods on clinical data to optimize predictive performance. The second part of the paper is concerned with image data. Although Histopathological Whole Slide Imaging (WSI) has been validated for a variety of pathological applications for over two decades of manual detection of cancerous tumors, it remains challenging and prone to human error. With the potential of deep learning models to aid pathologists in detecting cancer subtypes, and the increasing predictive ability of current image analysis techniques in identifying the underlying genomic data and cancer-causing mutations, the second half of the paper focusses on feature extraction using a deep convolutional neural network (U-Net) trained on WSI's from The Cancer Genome Atlas (TCGA) to accurately classify and extract relevant features. The focus is on feature extraction, nuclei-based instance segmentation, H&E-stained image extraction, and quantifying intensity information for a given WSI to classify the disease type. A comprehensive analysis of feature selection methods is presented for both clinical and medical image data.
引用
收藏
页码:1494 / 1525
页数:32
相关论文
共 50 条
  • [31] A model to perform prediction based on feature extraction of histopathological images of the breast
    Sushma Nagdeote
    Sapna Prabhu
    Multimedia Tools and Applications, 2024, 83 : 18119 - 18146
  • [32] A model to perform prediction based on feature extraction of histopathological images of the breast
    Nagdeote, Sushma
    Prabhu, Sapna
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (06) : 18119 - 18146
  • [33] Minimizing False Negatives in Metastasis Prediction for Breast Cancer Patients Through a Deep Stacked Ensemble Analysis of Whole Slide Images
    Munappa, Sunitha
    Subhashini, J.
    Suhasini, Pallikonda Sarah
    TRAITEMENT DU SIGNAL, 2023, 40 (03) : 1289 - 1295
  • [34] Lung cancer disease prediction with CT scan and histopathological images feature analysis using deep learning techniques
    Rajasekar, Vani
    Vaishnnave, M. P.
    Premkumar, S.
    Sarveshwaran, Velliangiri
    Rangaraaj, V.
    RESULTS IN ENGINEERING, 2023, 18
  • [35] Thyroid Cancer Malignancy Prediction From Whole Slide Cytopathology Images
    Dov, David
    Kovalsky, Shahar Z.
    Cohen, Jonathan
    Range, Danielle Elliott
    Henao, Ricardo
    Carin, Lawrence
    MACHINE LEARNING FOR HEALTHCARE CONFERENCE, VOL 106, 2019, 106
  • [36] Analysis of Histopathological Images for Prediction of Breast Cancer Using Traditional Classifiers with Pre-Trained CNN
    Gupta, Karan
    Chawla, Nidhi
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 878 - 889
  • [37] Prediction of cancer recurrence based on compact graphs of whole slide images
    Zhang, Fengyun
    Geng, Jie
    Zhang, De-Gan
    Gui, Jinglong
    Su, Ran
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 167
  • [38] Breast Cancer Prediction: Importance of Feature Selection
    Prateek
    ADVANCES IN COMPUTER COMMUNICATION AND COMPUTATIONAL SCIENCES, IC4S 2018, 2019, 924 : 733 - 742
  • [39] Region of interest (ROI) selection using vision transformer for automatic analysis using whole slide images
    Hossain, Md Shakhawat
    Shahriar, Galib Muhammad
    Syeed, M. M. Mahbubul
    Uddin, Mohammad Faisal
    Hasan, Mahady
    Shivam, Shingla
    Advani, Suresh
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [40] Ink Removal in Whole Slide Images using Hallucinated Data
    Ramanathan, Vishwesh
    Han, Wenchao
    Bassiouny, Dina
    Rakovitch, Eileen
    Martel, Anne L.
    MEDICAL IMAGING 2023, 2023, 12471