Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery

被引:17
|
作者
Guan, Xin [1 ,2 ]
Runger, George [1 ]
Liu, Li [1 ,3 ,4 ]
机构
[1] Arizona State Univ, Coll Hlth Solut, Phoenix, AZ 85004 USA
[2] Intel Corp, Chandler, AZ 85226 USA
[3] Arizona State Univ, Biodesign Inst, Tempe, AZ 85287 USA
[4] Mayo Clin, Dept Neurol, Scottsdale, AZ 85259 USA
关键词
Biomarker discovery; Domain knowledge; Feature selection; Regularized random forest; FEATURE-SELECTION; CANCER; PREDICTION; MODELS;
D O I
10.1186/s12859-020-3344-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. These methods often require that prior information is encoded as a single score and the algorithms are optimized for biological knowledge of a specific type. However, in practice, domain knowledge from diverse resources can provide complementary information. But no current methods can integrate heterogeneous prior information for biomarker discovery. To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection. Results Know-GRRF embeds domain knowledge in a regularized random forest framework. It combines prior information from multiple domains in a linear model to derive a composite score, which, together with other tuning parameters, controls the regularization of the random forests model. Know-GRRF concurrently optimizes the weight given to each type of domain knowledge and other tuning parameters to minimize the AIC of out-of-bag predictions. The objective is to select a compact feature subset that has a high discriminative power and strong functional relevance to the biological phenotype. Via rigorous simulations, we show that Know-GRRF guided by multiple-domain prior information outperforms feature selection methods guided by single-domain prior information or no prior information. We then applied Known-GRRF to a real-world study to identify prognostic biomarkers of prostate cancers. We evaluated the combination of cancer-related gene annotations, evolutionary conservation and pre-computed statistical scores as the prior knowledge to assemble a panel of biomarkers. We discovered a compact set of biomarkers with significant improvements on prediction accuracies. Conclusions Know-GRRF is a powerful novel method to incorporate knowledge from multiple domains for feature selection. It has a broad range of applications in biomarker discoveries. We implemented this method and released a KnowGRRF package in the R/CRAN archive.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Optimizing neoadjuvant therapy prior to surgery: Biomarker and target discovery
    Theodorescu, Dan
    CANCER RESEARCH, 2017, 77
  • [22] On methodology of representing knowledge in dynamic domains
    Gelfond, M
    Watson, R
    SCIENCE OF COMPUTER PROGRAMMING, 2002, 42 (01) : 87 - 99
  • [23] Immunodiagnosis of Tuberculosis: a Dynamic View of Biomarker Discovery
    Kunnath-Velayudhan, Shajo
    Gennaro, Maria Laura
    CLINICAL MICROBIOLOGY REVIEWS, 2011, 24 (04) : 792 - +
  • [24] Proteomics technologies for biomarker discovery in multiple sclerosis
    Singh, Vaibhav
    Hintzen, Rogier Q.
    Luider, Theo M.
    Stoop, Marcel P.
    JOURNAL OF NEUROIMMUNOLOGY, 2012, 248 (1-2) : 40 - 47
  • [25] New technologies for biomarker discovery in multiple sclerosis
    Comabella, Manuel
    Racke, Michael K.
    JOURNAL OF NEUROIMMUNOLOGY, 2012, 248 (1-2) : 1 - 1
  • [26] Using prior models as a measure of novelty in knowledge discovery
    Ludwig, J
    Fine, MJ
    Livingston, G
    Vozalis, E
    Buchanan, B
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2000, : 1071 - 1071
  • [27] Improved Incorporation of Prior Knowledge for Regularized FIR Model Identification
    Muenker, Tobias
    Belzi, Julian
    Nelles, Oliver
    2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC), 2018, : 1090 - 1095
  • [28] miRNA profiling for biomarker discovery in multiple sclerosis: From microarray to deep sequencing
    Guerau-de-Arellano, Mireia
    Alder, Hansjuerg
    Ozer, Hatice Gulcin
    Lovett-Racke, Amy
    Racke, Michael K.
    JOURNAL OF NEUROIMMUNOLOGY, 2012, 248 (1-2) : 32 - 39
  • [29] New Knowledge from Old: In silico discovery of novel protein domains in Streptomyces coelicolor
    Corin Yeats
    Stephen Bentley
    Alex Bateman
    BMC Microbiology, 3
  • [30] A complete search of combinatorial peptide library greatly benefited from probabilistic incorporation of prior knowledge
    Hruska, Miroslav
    Holub, Dusan
    INTERNATIONAL JOURNAL OF MASS SPECTROMETRY, 2022, 471