Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

被引:16
|
作者
Tkachev, Victor [1 ]
Sorokin, Maxim [1 ,2 ]
Borisov, Constantin [3 ]
Garazha, Andrew [1 ]
Buzdin, Anton [1 ,2 ,4 ,5 ]
Borisov, Nicolas [1 ,2 ,4 ]
机构
[1] OmicsWayCorp, Walnut, CA 91788 USA
[2] IM Sechenov First Moscow State Med Univ, Inst Personailzed Med, Moscow 119991, Russia
[3] Natl Res Univ Higher Sch Econ, Moscow 101000, Russia
[4] Moscow Inst Phys & Technol, Moscow 141701, Russia
[5] Shemyakin Ovchinnikov Inst Bioorgan Chem, Moscow 117997, Russia
基金
俄罗斯基础研究基金会;
关键词
bioinformatics; personalized medicine; oncology; chemotherapy; machine learning; omics profiling; COMPLETE RESPONSE; II ERROR; EXPRESSION; CLASSIFICATION; CHEMOTHERAPY; THERAPY; CANCER; BORTEZOMIB; INHIBITOR; SELECTION;
D O I
10.3390/ijms21030713
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naive Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41-235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61-0.88 range to 0.70-0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Prediction of Antenna Performance based on Scalable Data-informed Machine Learning Methods
    Chen, Yiming
    Demir, Veysel
    Bhupatiraju, Srirama
    Elsherbeni, Atef Z.
    Gavilan, Joselito
    Stoynov, Kiril
    APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL, 2024, 39 (04): : 275 - 290
  • [22] Research on trimming path for forked carrots using contour-based machine learning methods
    Lv, Lanlan
    Zheng, Zhaohui
    Xu, Jingshen
    Fu, Hanyu
    Ren, Liuyang
    Yang, Pei
    Xie, Weijun
    Yang, Deyong
    JOURNAL OF FOOD PROCESS ENGINEERING, 2024, 47 (01)
  • [23] Learning Performance Prediction-Based Personalized Feedback in Online Learning via Machine Learning
    Wang, Xizhe
    Zhang, Linjie
    He, Tao
    SUSTAINABILITY, 2022, 14 (13)
  • [24] Classification performance of machine learning methods in different data structures
    Aglarci, Ali Vasfi
    Bal, Cengiz
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024, 53 (12) : 6471 - 6489
  • [25] A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology
    Acharya, Debabrata
    Mukhopadhyay, Anirban
    BRIEFINGS IN FUNCTIONAL GENOMICS, 2024, 23 (05) : 549 - 560
  • [26] A review on omics-based biomarkers discovery for Alzheimer's disease from the bioinformatics perspectives: Statistical approach vs machine learning approach
    Tan, Mei Sze
    Cheah, Phaik-Leng
    Chin, Ai-Vyrn
    Looi, Lai-Meng
    Chang, Siow-Wee
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 139
  • [27] A Machine-Learning Tool Concurrently Models Single Omics and Phenome Data for Functional Subtyping and Personalized Cancer Medicine
    Nyamundanda, Gift
    Eason, Katherine
    Guinney, Justin
    Lord, Christopher J.
    Sadanandam, Anguraj
    CANCERS, 2020, 12 (10) : 1 - 14
  • [28] Personalized Guidance for Moroccan Students: An Approach Based on Machine Learning and Big Data
    Badrani, Morad
    Marouan, Adil
    Kannouf, Nabil
    Chetouani, Abdelaziz
    INTERNATIONAL JOURNAL OF ENGINEERING PEDAGOGY, 2025, 15 (01): : 125 - 136
  • [29] Global Fractional Vegetation Cover Estimation Algorithm for VIIRS Reflectance Data Based on Machine Learning Methods
    Liu, Duanyang
    Yang, Linqing
    Jia, Kun
    Liang, Shunlin
    Xiao, Zhiqiang
    Wei, Xiangqin
    Yao, Yunjun
    Xia, Mu
    Li, Yuwei
    REMOTE SENSING, 2018, 10 (10)
  • [30] Performance Of Soil Prediction Using Machine Learning For Data Clustering Methods
    Rajeshwari, M.
    Shunmuganathan, N.
    Sankarasubramanian, R.
    JOURNAL OF ALGEBRAIC STATISTICS, 2022, 13 (02) : 825 - 831