Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery

被引:17
|
作者
Guan, Xin [1 ,2 ]
Runger, George [1 ]
Liu, Li [1 ,3 ,4 ]
机构
[1] Arizona State Univ, Coll Hlth Solut, Phoenix, AZ 85004 USA
[2] Intel Corp, Chandler, AZ 85226 USA
[3] Arizona State Univ, Biodesign Inst, Tempe, AZ 85287 USA
[4] Mayo Clin, Dept Neurol, Scottsdale, AZ 85259 USA
关键词
Biomarker discovery; Domain knowledge; Feature selection; Regularized random forest; FEATURE-SELECTION; CANCER; PREDICTION; MODELS;
D O I
10.1186/s12859-020-3344-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. These methods often require that prior information is encoded as a single score and the algorithms are optimized for biological knowledge of a specific type. However, in practice, domain knowledge from diverse resources can provide complementary information. But no current methods can integrate heterogeneous prior information for biomarker discovery. To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection. Results Know-GRRF embeds domain knowledge in a regularized random forest framework. It combines prior information from multiple domains in a linear model to derive a composite score, which, together with other tuning parameters, controls the regularization of the random forests model. Know-GRRF concurrently optimizes the weight given to each type of domain knowledge and other tuning parameters to minimize the AIC of out-of-bag predictions. The objective is to select a compact feature subset that has a high discriminative power and strong functional relevance to the biological phenotype. Via rigorous simulations, we show that Know-GRRF guided by multiple-domain prior information outperforms feature selection methods guided by single-domain prior information or no prior information. We then applied Known-GRRF to a real-world study to identify prognostic biomarkers of prostate cancers. We evaluated the combination of cancer-related gene annotations, evolutionary conservation and pre-computed statistical scores as the prior knowledge to assemble a panel of biomarkers. We discovered a compact set of biomarkers with significant improvements on prediction accuracies. Conclusions Know-GRRF is a powerful novel method to incorporate knowledge from multiple domains for feature selection. It has a broad range of applications in biomarker discoveries. We implemented this method and released a KnowGRRF package in the R/CRAN archive.
引用
收藏
页数:10
相关论文
共 50 条
  • [11] Hypergraph Model of prior knowledge in opportunity discovery
    Wu, Yingmin
    Cai, Shuqing
    KAM: 2008 INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING, PROCEEDINGS, 2008, : 216 - 220
  • [12] Dimension Reduction With Prior Information for Knowledge Discovery
    Bui, Anh Tuan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3625 - 3636
  • [13] A tool for controlled knowledge discovery in spatial domains
    Pokrajac, D
    Obradovic, Z
    Fiez, T
    SIMULATION AND MODELLING: ENABLERS FOR A BETTER QUALITY OF LIFE, 2000, : 26 - 29
  • [14] Incorporation of prior knowledge and habits while solving anagrams
    Murray, Jesse
    Lobifaro, Angelia
    Kouh, Minjoon
    Sutter, Andrew
    Cousens, Graham
    JOURNAL OF EYE MOVEMENT RESEARCH, 2022, 15 (05):
  • [15] Adaptive Auction Mechanism Design and the Incorporation of Prior Knowledge
    Pardoe, David
    Stone, Peter
    Saar-Tsechansky, Maytal
    Keskin, Tayfun
    Tomak, Kerem
    INFORMS JOURNAL ON COMPUTING, 2010, 22 (03) : 353 - 370
  • [16] Dynamic Networks and Knowledge Discovery
    Pensa, Ruggero G.
    Cordero, Francesca
    Rouveirol, Ceine
    Kanawati, Rushed
    INTELLIGENT DATA ANALYSIS, 2013, 17 (01) : 1 - 3
  • [17] Learning representation from multiple media domains for enhanced event discovery
    Yang, Zhenguo
    Li, Qing
    Xie, Haoran
    Wang, Qi
    Liu, Wenyin
    PATTERN RECOGNITION, 2021, 110
  • [18] Knowledge Discovery of Multiple-topic Document using Parametric Mixture Model with Dirichlet Prior
    Sato, Issei
    Nakagawa, Hiroshi
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 590 - 598
  • [19] Application of Rough Set Theory in Knowledge Discovery from Multiple Knowledge Base
    ZhuGe, Jianping
    2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT UBIQUITOUS COMPUTING AND EDUCATION, 2009, : 488 - 491
  • [20] Dynamic integration of multiple data mining techniques in a knowledge discovery management system
    Puuronen, S
    Terziyan, V
    Katasonov, A
    Tsymbal, A
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS, AND TECHNOLOGY, 1999, 3695 : 128 - 139