An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat

被引:0
|
作者
Nastasiya F. Grinberg
Oghenejokpeme I. Orhobor
Ross D. King
机构
[1] University of Manchester,School of Computer Science
[2] Cambridge Institute of Therapeutic Immunology & Infectious Disease,Department of Medicine
[3] Jeffrey Cheah Biomedical Centre,Department of Biology and Biological Engineering, Division of Systems and Synthetic Biology
[4] Cambridge Biomedical Campus,undefined
[5] University of Cambridge,undefined
[6] Chalmers University of Technology,undefined
来源
Machine Learning | 2020年 / 109卷
关键词
Random forest; Gradient boosting machines; Support vector machines; Lasso regression; Ridge regression; BLUP; GWAS; Statistical genetics; Plant biology;
D O I
暂无
中图分类号
学科分类号
摘要
In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM), with two state-of-the-art classical statistical genetics methods; genomic BLUP and a two-step sequential method based on linear regression. Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. In the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure. This suggests that standard machine learning methods need to be refined to include population structure information when this is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise, but that determining which methods is likely to perform well on any given problem is elusive and non-trivial.
引用
收藏
页码:251 / 277
页数:26
相关论文
共 50 条
  • [1] An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat
    Grinberg, Nastasiya F.
    Orhobor, Oghenejokpeme I.
    King, Ross D.
    MACHINE LEARNING, 2020, 109 (02) : 251 - 277
  • [2] An evaluation of machine-learning methods for predicting pneumonia mortality
    Cooper, GF
    Aliferis, CF
    Ambrosino, R
    Aronis, J
    Buchanan, BG
    Caruana, R
    Fine, MJ
    Glymour, C
    Gordon, G
    Hanusa, BH
    Janosky, JE
    Meek, C
    Mitchell, T
    Richardson, T
    Spirtes, P
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 1997, 9 (02) : 107 - 138
  • [3] EXPLAINABLE MACHINE-LEARNING FOR PREDICTING PREOPERATIVE FRAILTY PHENOTYPE USING ELECTRONIC HEALTH RECORDS
    Mardini, Mamoun
    Price, Catherine
    Tighe, Patrick
    Manini, Todd
    INNOVATION IN AGING, 2022, 6 : 564 - 564
  • [4] Evaluation of supervised machine-learning methods for predicting appearance traits from DNA
    Katsara, Maria-Alexandra
    Branicki, Wojciech
    Walsh, Susan
    Kayser, Manfred
    Nothnagel, Michael
    FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2021, 53
  • [5] Machine-Learning Studies on Spin Models
    Shiina, Kenta
    Mori, Hiroyuki
    Okabe, Yutaka
    Lee, Hwee Kuan
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [6] Machine-Learning Studies on Spin Models
    Kenta Shiina
    Hiroyuki Mori
    Yutaka Okabe
    Hwee Kuan Lee
    Scientific Reports, 10
  • [7] Predicting Perovskite Performance with Multiple Machine-Learning Algorithms
    Li, Ruoyu
    Deng, Qin
    Tian, Dong
    Zhu, Daoye
    Lin, Bin
    CRYSTALS, 2021, 11 (07)
  • [8] A Machine-Learning Approach to Predicting Need for Hospitalization for Pediatric
    Patel, Shilpa J.
    Chamberlain, Daniel
    Chamberlain, James M.
    PEDIATRICS, 2018, 142
  • [9] Applicability of Machine-Learning Techniques in Predicting Customer Defection
    Prasasti, Niken
    Ohwada, Hayato
    2014 1ST INTERNATIONAL SYMPOSIUM ON TECHNOLOGY MANAGEMENT AND EMERGING TECHNOLOGIES (ISTMET 2014), 2014, : 157 - 162
  • [10] Predicting loss aversion behavior with machine-learning methods
    Ömür Saltık
    Wasim ul Rehman
    Rıdvan Söyü
    Süleyman Değirmen
    Ahmet Şengönül
    Humanities and Social Sciences Communications, 10