Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

被引:16
|
作者
Tkachev, Victor [1 ]
Sorokin, Maxim [1 ,2 ]
Borisov, Constantin [3 ]
Garazha, Andrew [1 ]
Buzdin, Anton [1 ,2 ,4 ,5 ]
Borisov, Nicolas [1 ,2 ,4 ]
机构
[1] OmicsWayCorp, Walnut, CA 91788 USA
[2] IM Sechenov First Moscow State Med Univ, Inst Personailzed Med, Moscow 119991, Russia
[3] Natl Res Univ Higher Sch Econ, Moscow 101000, Russia
[4] Moscow Inst Phys & Technol, Moscow 141701, Russia
[5] Shemyakin Ovchinnikov Inst Bioorgan Chem, Moscow 117997, Russia
基金
俄罗斯基础研究基金会;
关键词
bioinformatics; personalized medicine; oncology; chemotherapy; machine learning; omics profiling; COMPLETE RESPONSE; II ERROR; EXPRESSION; CLASSIFICATION; CHEMOTHERAPY; THERAPY; CANCER; BORTEZOMIB; INHIBITOR; SELECTION;
D O I
10.3390/ijms21030713
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naive Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41-235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61-0.88 range to 0.70-0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance
    Ahsan, Md Manjurul
    Mahmud, M. A. Parvez
    Saha, Pritom Kumar
    Gupta, Kishor Datta
    Siddique, Zahed
    TECHNOLOGIES, 2021, 9 (03)
  • [32] Evaluating global intelligence innovation: An index based on machine learning methods
    Ma, Xiaoyu
    Hao, Yizhi
    Li, Xiao
    Liu, Jun
    Qi, Jiasen
    TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2023, 194
  • [33] Analysis of Drug Sales Data based on Machine Learning Methods
    Al-Gunaid, Mohammed A.
    Shcherbakov, Maxim V.
    Kravets, Alla G.
    Loshmanov, Vadim I.
    Shumkin, Alexandr M.
    Trubitsin, Vladislav V.
    Vakulenko, Darya V.
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART), 2018, : 32 - 38
  • [34] Research and analysis of psychological data based on machine learning methods
    Chen G.
    Lv W.
    Ma J.
    Liang Y.
    International Journal of Wireless and Mobile Computing, 2022, 22 (01) : 1 - 8
  • [35] Ship Classification Based on AIS Data and Machine Learning Methods
    Huang, I-Lun
    Lee, Man-Chun
    Nieh, Chung-Yuan
    Huang, Juan-Chen
    ELECTRONICS, 2024, 13 (01)
  • [36] Machine Learning Based Methods Used for Improving Scholar Performance
    Boncea, Radu
    Petre, Ionut
    Vevera, Victor
    Gheorghita, Alexandru
    NEW TECHNOLOGIES AND REDESIGNING LEARNING SPACES, VOL II, 2019, : 471 - 478
  • [37] Global-scale biomass estimation based on machine learning and deep learning methods
    Talebiesfandarani, Somayeh
    Shamsoddini, Ali
    REMOTE SENSING APPLICATIONS-SOCIETY AND ENVIRONMENT, 2022, 28
  • [38] Application of Machine Learning Methods to Improve the Performance of Ultrasound in Head and Neck Oncology: A Literature Review
    DeJohn, Celia R.
    Grant, Sydney R.
    Seshadri, Mukund
    CANCERS, 2022, 14 (03)
  • [39] Improving survival prediction using flexible late fusion machine learning framework for multi-omics data integration
    Nikolaou, Nikos
    Salazar, Domingo
    RaviPrakash, Harish
    Goncalves, Miguel
    Argoty, Gustavo Alonso Arango
    Burlutsky, Nikolay
    Markuzon, Natasha
    Jacob, Etai
    CANCER RESEARCH, 2023, 83 (07)
  • [40] Personalized programming education: Using machine learning to boost learning performance based on students' personality traits
    Tseng, Chun-Hsiung
    Lin, Hao-Chiang Koong
    Huang, Andrew Chih-Wei
    Lin, Jia-Rou
    COGENT EDUCATION, 2023, 10 (02):