Annotating gene sets by mining large literature collections with protein networks

被引:0
|
作者
Wang, Sheng [1 ]
Ma, Jianzhu [2 ]
Yu, Michael Ku [2 ]
Zheng, Fan [2 ]
Huang, Edward W. [1 ]
Han, Jiawei [1 ]
Peng, Jian [1 ]
Ideker, Trey [2 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Univ Calif San Diego, Sch Med, San Diego, CA 92103 USA
关键词
text mining; functional annotations; knowledge network; gene interactions; CANCER; CELL; ENCYCLOPEDIA; EXPRESSION; MODELS;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.
引用
收藏
页码:602 / 613
页数:12
相关论文
共 50 条
  • [21] Integrative mining of traditional Chinese medicine literature and MEDLINE for functional gene networks
    Zhou, Xuezhong
    Liu, Baoyan
    Wu, Zhaohui
    Feng, Yi
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2007, 41 (02) : 87 - 104
  • [22] Mining Overall Sentiment in Large Sets of Opinions
    Navrat, Pavol
    Ezzeddine, Anna Bou
    Slizik, Lukas
    ADVANCES IN INTELLIGENT WEB MASTERING-2, PROCEEDINGS, 2010, 67 : 167 - 173
  • [23] An Efficient Algorithm for Mining Large Item Sets
    Zheng, Hong-Zhen
    Chu, Dian-Hui
    Zhan, De-Chen
    Xu, Xiao-Fei
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 561 - 564
  • [24] An efficient algorithm for mining large item sets
    Zheng, Hong-Zhen
    Chu, Dian-Hui
    Zhan, De-Chen
    3RD INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS, AND APPLICAT/4TH INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 2, 2006, : 151 - +
  • [25] gProt:: Annotating protein interactions using Google and Gene Ontology
    Sætre, R
    Tveit, A
    Ranang, MT
    Steigedal, TS
    Thommesen, L
    Stunes, K
    Lægreid, A
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2005, 3683 : 1195 - 1203
  • [26] Active mining discriminative gene sets (invited)
    Chu, Feng
    Wang, Lipo
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING - ICAISC 2006, PROCEEDINGS, 2006, 4029 : 880 - 889
  • [27] Episode-based conceptual mining of large health collections
    Semenova, T
    CONCEPTUAL MODELING - ER 2003, PROCEEDINGS, 2003, 2813 : 579 - 581
  • [28] MatchMiner: Efficient Spanning Structure Mining in Large Image Collections
    Lou, Yin
    Snavely, Noah
    Gehrke, Johannes
    COMPUTER VISION - ECCV 2012, PT II, 2012, 7573 : 45 - 58
  • [29] ARROGANT: an application to manipulate large gene collections
    Kulkarni, AV
    Williams, NS
    Lian, Y
    Wren, JD
    Mittelman, D
    Pertsemlidis, A
    Garner, HR
    BIOINFORMATICS, 2002, 18 (11) : 1410 - 1417
  • [30] Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks
    Cao, Renzhi
    Cheng, Jianlin
    METHODS, 2016, 93 : 84 - 91