Annotating gene sets by mining large literature collections with protein networks

被引:0
|
作者
Wang, Sheng [1 ]
Ma, Jianzhu [2 ]
Yu, Michael Ku [2 ]
Zheng, Fan [2 ]
Huang, Edward W. [1 ]
Han, Jiawei [1 ]
Peng, Jian [1 ]
Ideker, Trey [2 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Univ Calif San Diego, Sch Med, San Diego, CA 92103 USA
关键词
text mining; functional annotations; knowledge network; gene interactions; CANCER; CELL; ENCYCLOPEDIA; EXPRESSION; MODELS;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.
引用
收藏
页码:602 / 613
页数:12
相关论文
共 50 条
  • [31] Mining term networks from text collections for crime investigation
    Tseng, Yuen-Hsien
    Ho, Zih-Ping
    Yang, Kai-Sheng
    Chen, Chun-Cheng
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (11) : 10082 - 10090
  • [32] Mining literature for protein-protein interactions
    Marcotte, EM
    Xenarios, I
    Eisenberg, D
    BIOINFORMATICS, 2001, 17 (04) : 359 - 363
  • [33] Visual data mining of large spatial data sets
    Keim, DA
    Panse, C
    Sips, M
    DATABASES IN NETWORKED INFORMATION SYSTEMS, PROCEEDINGS, 2003, 2822 : 201 - 215
  • [34] Mining large heterogeneous data sets in drug discovery
    Wild, David J.
    EXPERT OPINION ON DRUG DISCOVERY, 2009, 4 (10) : 995 - 1004
  • [35] Mining large data sets on grids: Issues and prospects
    Skillicorn, D
    Talia, D
    COMPUTING AND INFORMATICS, 2002, 21 (04) : 347 - 362
  • [36] From visualisation to data mining with large data sets
    Adelmann, A
    Ryne, RD
    Shalf, JM
    Siegerist, C
    2005 IEEE PARTICLE ACCELERATOR CONFERENCE (PAC), VOLS 1-4, 2005, : 542 - 544
  • [37] Mining Bayesian network structure for large sets of variables
    Klopotek, MA
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2002, 2366 : 114 - 122
  • [38] Visual data mining in large geospatial point sets
    Keim, DA
    Panse, C
    Sips, M
    North, SC
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2004, 24 (05) : 36 - 44
  • [39] Distributed Strategies for Mining Outliers in Large Data Sets
    Angiulli, Fabrizio
    Basta, Stefano
    Lodi, Stefano
    Sartori, Claudio
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (07) : 1520 - 1532
  • [40] Functional variation of alternative splice forms in their protein interaction networks: a literature mining approach
    Şenay Kafkas
    Ekrem Varoğlu
    Dietrich Rebholz-Schuhmann
    Bahar Taneri
    BMC Bioinformatics, 11