An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat

被引：0

作者：

Nastasiya F. Grinberg

Oghenejokpeme I. Orhobor

Ross D. King

机构：

[1] University of Manchester,School of Computer Science

[2] Cambridge Institute of Therapeutic Immunology & Infectious Disease,Department of Medicine

[3] Jeffrey Cheah Biomedical Centre,Department of Biology and Biological Engineering, Division of Systems and Synthetic Biology

[4] Cambridge Biomedical Campus,undefined

[5] University of Cambridge,undefined

[6] Chalmers University of Technology,undefined

来源：

Machine Learning | 2020年 / 109卷

关键词：

Random forest; Gradient boosting machines; Support vector machines; Lasso regression; Ridge regression; BLUP; GWAS; Statistical genetics; Plant biology;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM), with two state-of-the-art classical statistical genetics methods; genomic BLUP and a two-step sequential method based on linear regression. Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. In the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure. This suggests that standard machine learning methods need to be refined to include population structure information when this is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise, but that determining which methods is likely to perform well on any given problem is elusive and non-trivial.

引用

页码：251 / 277

页数：26

共 50 条

[1] An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat
Grinberg, Nastasiya F.
Orhobor, Oghenejokpeme I.
King, Ross D.
MACHINE LEARNING, 2020, 109 (02) : 251 - 277
[2] An evaluation of machine-learning methods for predicting pneumonia mortality
Cooper, GF
Aliferis, CF
Ambrosino, R
Aronis, J
Buchanan, BG
Caruana, R
Fine, MJ
Glymour, C
Gordon, G
Hanusa, BH
Janosky, JE
Meek, C
Mitchell, T
Richardson, T
Spirtes, P
ARTIFICIAL INTELLIGENCE IN MEDICINE, 1997, 9 (02) : 107 - 138
[3] EXPLAINABLE MACHINE-LEARNING FOR PREDICTING PREOPERATIVE FRAILTY PHENOTYPE USING ELECTRONIC HEALTH RECORDS
Mardini, Mamoun
Price, Catherine
Tighe, Patrick
Manini, Todd
INNOVATION IN AGING, 2022, 6 : 564 - 564
[4] Evaluation of supervised machine-learning methods for predicting appearance traits from DNA
Katsara, Maria-Alexandra
Branicki, Wojciech
Walsh, Susan
Kayser, Manfred
Nothnagel, Michael
FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2021, 53
[5] Machine-Learning Studies on Spin Models
Shiina, Kenta
Mori, Hiroyuki
Okabe, Yutaka
Lee, Hwee Kuan
SCIENTIFIC REPORTS, 2020, 10 (01)
[6] Machine-Learning Studies on Spin Models
Kenta Shiina
Hiroyuki Mori
Yutaka Okabe
Hwee Kuan Lee
Scientific Reports, 10
[7] Predicting Perovskite Performance with Multiple Machine-Learning Algorithms
Li, Ruoyu
Deng, Qin
Tian, Dong
Zhu, Daoye
Lin, Bin
CRYSTALS, 2021, 11 (07)
[8] A Machine-Learning Approach to Predicting Need for Hospitalization for Pediatric
Patel, Shilpa J.
Chamberlain, Daniel
Chamberlain, James M.
PEDIATRICS, 2018, 142
[9] Applicability of Machine-Learning Techniques in Predicting Customer Defection
Prasasti, Niken
Ohwada, Hayato
2014 1ST INTERNATIONAL SYMPOSIUM ON TECHNOLOGY MANAGEMENT AND EMERGING TECHNOLOGIES (ISTMET 2014), 2014, : 157 - 162
[10] Predicting loss aversion behavior with machine-learning methods
Ömür Saltık
Wasim ul Rehman
Rıdvan Söyü
Süleyman Değirmen
Ahmet Şengönül
Humanities and Social Sciences Communications, 10

← 1 2 3 4 5 →