Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

被引:16
|
作者
Tkachev, Victor [1 ]
Sorokin, Maxim [1 ,2 ]
Borisov, Constantin [3 ]
Garazha, Andrew [1 ]
Buzdin, Anton [1 ,2 ,4 ,5 ]
Borisov, Nicolas [1 ,2 ,4 ]
机构
[1] OmicsWayCorp, Walnut, CA 91788 USA
[2] IM Sechenov First Moscow State Med Univ, Inst Personailzed Med, Moscow 119991, Russia
[3] Natl Res Univ Higher Sch Econ, Moscow 101000, Russia
[4] Moscow Inst Phys & Technol, Moscow 141701, Russia
[5] Shemyakin Ovchinnikov Inst Bioorgan Chem, Moscow 117997, Russia
基金
俄罗斯基础研究基金会;
关键词
bioinformatics; personalized medicine; oncology; chemotherapy; machine learning; omics profiling; COMPLETE RESPONSE; II ERROR; EXPRESSION; CLASSIFICATION; CHEMOTHERAPY; THERAPY; CANCER; BORTEZOMIB; INHIBITOR; SELECTION;
D O I
10.3390/ijms21030713
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naive Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41-235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61-0.88 range to 0.70-0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Flexible Data Trimming for Different Machine Learning Methods in Omics-Based Personalized Oncology
    Tkachev, Victor
    Buzdin, Anton
    Borisov, Nicolas
    MATHEMATICAL AND COMPUTATIONAL ONCOLOGY, ISMCO 2019, 2019, 11826 : 62 - 71
  • [2] Omics-based nanomedicine: The future of personalized oncology
    Rosenblum, Daniel
    Peer, Dan
    CANCER LETTERS, 2014, 352 (01) : 126 - 136
  • [3] OMICS-based personalized oncology: if it is worth doing, it is worth doing well!
    Daniel F Hayes
    BMC Medicine, 11
  • [4] OMICS-based personalized oncology: if it is worth doing, it is worth doing well!
    Hayes, Daniel F.
    BMC MEDICINE, 2013, 11
  • [5] Precision Cardio-oncology: Update on Omics-Based Diagnostic Methods
    Kuang, Ziyu
    Kong, Miao
    Yan, Ningzhe
    Ma, Xinyi
    Wu, Min
    Li, Jie
    CURRENT TREATMENT OPTIONS IN ONCOLOGY, 2024, 25 (05) : 679 - 701
  • [6] An omics-based machine learning approach to predict diabetes progression: a RHAPSODY study
    Slieker, Roderick C.
    Munch, Magnus
    Donnelly, Louise A.
    Bouland, Gerard A.
    Dragan, Iulian
    Kuznetsov, Dmitry
    Elders, Petra J. M.
    Rutter, Guy A.
    Ibberson, Mark
    Pearson, Ewan R.
    Hart, Leen M. 't
    van de Wiel, Mark A.
    Beulens, Joline W. J.
    DIABETOLOGIA, 2024, 67 (05) : 885 - 894
  • [7] High-Performance Computing and Big Data in Omics-Based Medicine
    Merelli, Ivan
    Perez-Sanchez, Horacio
    Gesing, Sandra
    D'Agostino, Daniele
    BIOMED RESEARCH INTERNATIONAL, 2014, 2014
  • [8] New Paradigm of Machine Learning (ML) in Personalized Oncology: Data Trimming for Squeezing More Biomarkers From Clinical Datasets
    Borisov, Nicolas
    Buzdin, Anton
    FRONTIERS IN ONCOLOGY, 2019, 9
  • [9] An omics-based machine learning approach to predict diabetes progression: a RHAPSODY study
    Roderick C. Slieker
    Magnus Münch
    Louise A. Donnelly
    Gerard A. Bouland
    Iulian Dragan
    Dmitry Kuznetsov
    Petra J. M. Elders
    Guy A. Rutter
    Mark Ibberson
    Ewan R. Pearson
    Leen M. ’t Hart
    Mark A. van de Wiel
    Joline W. J. Beulens
    Diabetologia, 2024, 67 : 885 - 894
  • [10] Spatial omics-based machine learning algorithms for the early detection of hepatocellular carcinoma
    Mengjun Wang
    Stephane Grauzam
    Muhammed Furkan Bayram
    James Dressman
    Andrew DelaCourt
    Calvin Blaschke
    Hongyan Liang
    Danielle Scott
    Gray Huffman
    Alyson Black
    Shaaron Ochoa-Rios
    David Lewin
    Peggi M. Angel
    Richard R. Drake
    Lauren Ball
    Jennifer Bethard
    Stephen Castellino
    Yuko Kono
    Naoto Kubota
    Yujin Hoshida
    Lisa Quirk
    Adam Yopp
    Purva Gopal
    Amit Singal
    Anand S. Mehta
    Communications Medicine, 4 (1):