Annotating gene sets by mining large literature collections with protein networks

被引:0
|
作者
Wang, Sheng [1 ]
Ma, Jianzhu [2 ]
Yu, Michael Ku [2 ]
Zheng, Fan [2 ]
Huang, Edward W. [1 ]
Han, Jiawei [1 ]
Peng, Jian [1 ]
Ideker, Trey [2 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Univ Calif San Diego, Sch Med, San Diego, CA 92103 USA
关键词
text mining; functional annotations; knowledge network; gene interactions; CANCER; CELL; ENCYCLOPEDIA; EXPRESSION; MODELS;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.
引用
收藏
页码:602 / 613
页数:12
相关论文
共 50 条
  • [1] Annotating proteins by mining protein interaction networks
    Kirac, Mustafa
    Ozsoyoglu, Gultekin
    Yang, Jiong
    BIOINFORMATICS, 2006, 22 (14) : E260 - E270
  • [2] Mining large image collections
    Burl, MC
    DATA MINING FOR SCIENTIFIC AND ENGINEERING APPLICATIONS, 2001, 2 : 63 - 84
  • [3] GSAn: an alternative to enrichment analysis for annotating gene sets
    Ayllon-Benitez, Aaron
    Bourqui, Romain
    Thebault, Patricia
    Mougin, Fleur
    NAR GENOMICS AND BIOINFORMATICS, 2020, 2 (02)
  • [4] BUGLE 2.0-Browsing and Annotating Large Marine Image Collections
    Langenkaemper, Daniel
    Zurowietz, Martin
    Schoening, Timm
    Nattkemper, Tim W.
    FRONTIERS IN MARINE SCIENCE, 2017, 4
  • [5] BCSearch: fast structural fragment mining over large collections of protein structures
    Guyon, Frederic
    Martz, Francois
    Vavrusa, Marek
    Becot, Jerome
    Rey, Julien
    Tuffery, Pierre
    NUCLEIC ACIDS RESEARCH, 2015, 43 (W1) : W378 - W382
  • [6] Multiscale Gene Sets From Protein Interaction Networks
    Yang, Shu
    Lisa Pham
    Christadore, Lisa M.
    Schaus, Scott
    Kolaczyk, Eric D.
    2013 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2013, : 435 - 438
  • [7] A Coclustering Approach for Mining Large Protein-Protein Interaction Networks
    Pizzuti, Clara
    Rombo, Simona E.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (03) : 717 - 730
  • [8] Text mining biomedical literature for constructing gene regulatory networks
    Yong-Ling Song
    Su-Shing Chen
    Interdisciplinary Sciences: Computational Life Sciences, 2009, 1 : 179 - 186
  • [9] Text Mining Biomedical Literature for Constructing Gene Regulatory Networks
    Song, Yong-Ling
    Chen, Su-Shing
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2009, 1 (03) : 179 - 186
  • [10] Biomedical literature mining for text classification and construction of gene networks
    Antonakaki, Despoina
    Kanterakis, Alexandros
    Potamias, George
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 469 - 473