An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat

被引：0

作者：

Nastasiya F. Grinberg

Oghenejokpeme I. Orhobor

Ross D. King

机构：

[1] University of Manchester,School of Computer Science

[2] Cambridge Institute of Therapeutic Immunology & Infectious Disease,Department of Medicine

[3] Jeffrey Cheah Biomedical Centre,Department of Biology and Biological Engineering, Division of Systems and Synthetic Biology

[4] Cambridge Biomedical Campus,undefined

[5] University of Cambridge,undefined

[6] Chalmers University of Technology,undefined

来源：

Machine Learning | 2020年 / 109卷

关键词：

Random forest; Gradient boosting machines; Support vector machines; Lasso regression; Ridge regression; BLUP; GWAS; Statistical genetics; Plant biology;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM), with two state-of-the-art classical statistical genetics methods; genomic BLUP and a two-step sequential method based on linear regression. Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. In the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure. This suggests that standard machine learning methods need to be refined to include population structure information when this is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise, but that determining which methods is likely to perform well on any given problem is elusive and non-trivial.

引用

页码：251 / 277

页数：26

共 50 条

[31] Raman spectroscopy and machine-learning for edible oils evaluation
Berghian-Grosan, Camelia
Magdas, Dana Alina
TALANTA, 2020, 218
[32] PlantMine: A Machine-Learning Framework to Detect Core SNPs in Rice Genomics
Tong, Kai
Chen, Xiaojing
Yan, Shen
Dai, Liangli
Liao, Yuxue
Li, Zhaoling
Wang, Ting
GENES, 2024, 15 (05)
[33] Predicting the Mortality of ICU Patients by Topic Model with Machine-Learning Techniques
Chiu, Chih-Chou
Wu, Chung-Min
Chien, Te-Nien
Kao, Ling-Jing
Qiu, Jiantai Timothy
HEALTHCARE, 2022, 10 (06)
[34] Predicting the chemical reactivity of organic materials using a machine-learning approach
Lee, Byungju
Yoo, Jaekyun
Kang, Kisuk
CHEMICAL SCIENCE, 2020, 11 (30) : 7813 - 7822
[35] Application of Machine-Learning Algorithms for Predicting California Bearing Ratio of Soil
Bherde, Vaishnavi
Mallikarjunappa, Likhith Kudlur
Baadiga, Ramu
Balunaini, Umashankar
JOURNAL OF TRANSPORTATION ENGINEERING PART B-PAVEMENTS, 2023, 149 (04)
[36] Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning
Campos, Tulio L.
Korhonen, Pasi K.
Sternberg, Paul W.
Gasser, Robin B.
Young, Neil D.
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2020, 18 : 1093 - 1102
[37] Effective Machine-Learning Models for Predicting Routability During FPGA Placement
Martin, T.
Areibi, S.
Grewal, G.
2021 ACM/IEEE 3RD WORKSHOP ON MACHINE LEARNING FOR CAD (MLCAD), 2021,
[38] MigraineCloud A Machine-Learning IOT Framework for Capturing Triggers and Predicting Migraines
Mohan, Shrey
Mukherjee, Arindam
IEEE SOUTHEASTCON 2018, 2018,
[39] DEVELOPMENT OF A MACHINE-LEARNING MODEL FOR PREDICTING POST-ERCP PANCREATITIS
Hidekazu, Takahashi
Ohno, Eizaburo
Furukawa, Taiki
Yamao, Kentaro
Ishikawa, Takuya
Mizutani, Yasuyuki
Iida, Tadashi
Shiratori, Yoshimune
Oyama, Shintaro
Koyama, Junji
Mori, Kensaku
Hayashi, Yuichiro
Oda, Masahiro
Suzuki, Takahisa
Kawashima, Hiroki
GASTROINTESTINAL ENDOSCOPY, 2023, 97 (06) : AB615 - AB615
[40] A machine-learning ensemble model for predicting energy consumption in smart homes
Priyadarshini, Ishaani
Sahu, Sandipan
Kumar, Raghvendra
Taniar, David
INTERNET OF THINGS, 2022, 20

← 1 2 3 4 5 →