Automatic extraction of angiogenesis bioprocess from text

被引:11
|
作者
Wang, Xinglong [1 ,2 ]
McKendrick, Iain [3 ]
Barrett, Ian [3 ]
Dix, Ian [3 ]
French, Tim [3 ]
Tsujii, Jun'ichi [4 ]
Ananiadou, Sophia [1 ,2 ]
机构
[1] Univ Manchester, Natl Ctr Text Min, Manchester, Lancs, England
[2] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
[3] AstraZeneca, Alderley Pk, England
[4] Microsoft Res Asia, Beijing, Peoples R China
基金
英国生物技术与生命科学研究理事会;
关键词
PROTEIN; MODELS;
D O I
10.1093/bioinformatics/btr460
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Understanding key biological processes (bioprocesses) and their relationships with constituent biological entities and pharmaceutical agents is crucial for drug design and discovery. One way to harvest such information is searching the literature. However, bioprocesses are difficult to capture because they may occur in text in a variety of textual expressions. Moreover, a bioprocess is often composed of a series of bioevents, where a bioevent denotes changes to one or a group of cells involved in the bioprocess. Such bioevents are often used to refer to bioprocesses in text, which current techniques, relying solely on specialized lexicons, struggle to find. Results: This article presents a range of methods for finding bioprocess terms and events. To facilitate the study, we built a gold standard corpus in which terms and events related to angiogenesis, a key biological process of the growth of new blood vessels, were annotated. Statistics of the annotated corpus revealed that over 36% of the text expressions that referred to angiogenesis appeared as events. The proposed methods respectively employed domain-specific vocabularies, a manually annotated corpus and unstructured domain-specific documents. Evaluation results showed that, while a supervised machine-learning model yielded the best precision, recall and F1 scores, the other methods achieved reasonable performance and less cost to develop.
引用
收藏
页码:2730 / 2737
页数:8
相关论文
共 50 条
  • [21] Automatic Open Domain Information Extraction from Indonesian Text
    Gultom, Yohanes
    Wibowo, Wahyu Catur
    2017 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS 2017), 2017, : 23 - 30
  • [22] Automatic extraction of useful facet hierarchies from text databases
    Dakka, Wisam
    Ipeirotis, Panagiotis G.
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 466 - +
  • [23] TEXT: Automatic Template Extraction from Heterogeneous Web Pages
    Kim, Chulyun
    Shim, Kyuseok
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (04) : 612 - 626
  • [24] Automatic extraction of persistent topics from social text streams
    Shin, Yongwook
    Ryo, Chuhyeop
    Park, Jonghun
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2014, 17 (06): : 1395 - 1420
  • [26] Profile extraction from mean profile for automatic text categorization
    Lakshmi, K.
    Mukherjee, Saswati
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 2, PROCEEDINGS, 2006, : 384 - +
  • [27] Automatic extraction of microorganisms and their habitats from free text using text mining workflows
    Kolluru, BalaKrishna
    Nakjang, Sirintra
    Hirt, Robert P.
    Wipat, Anil
    Ananiadou, Sophia
    JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2011, 8 (02):
  • [28] Speech-to-Text Summarization Using Automatic Phrase Extraction from Recognized Text
    Rott, Michal
    Cerva, Petr
    TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 101 - 108
  • [29] Automatic Ontology Extraction with Text Clustering
    Di Martino, Beniamino
    Cantiello, Pasquale
    INTELLIGENT DISTRIBUTED COMPUTING III, 2009, 237 : 215 - 220
  • [30] Automatic Extraction of Text and Non-text Information Directly from Compressed Document Images
    Javed, Mohammed
    Nagabhushan, P.
    Chaudhuri, Bidyut B.
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016), 2017, 552 : 38 - 46