Automatic extraction of angiogenesis bioprocess from text

被引:11
|
作者
Wang, Xinglong [1 ,2 ]
McKendrick, Iain [3 ]
Barrett, Ian [3 ]
Dix, Ian [3 ]
French, Tim [3 ]
Tsujii, Jun'ichi [4 ]
Ananiadou, Sophia [1 ,2 ]
机构
[1] Univ Manchester, Natl Ctr Text Min, Manchester, Lancs, England
[2] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
[3] AstraZeneca, Alderley Pk, England
[4] Microsoft Res Asia, Beijing, Peoples R China
基金
英国生物技术与生命科学研究理事会;
关键词
PROTEIN; MODELS;
D O I
10.1093/bioinformatics/btr460
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Understanding key biological processes (bioprocesses) and their relationships with constituent biological entities and pharmaceutical agents is crucial for drug design and discovery. One way to harvest such information is searching the literature. However, bioprocesses are difficult to capture because they may occur in text in a variety of textual expressions. Moreover, a bioprocess is often composed of a series of bioevents, where a bioevent denotes changes to one or a group of cells involved in the bioprocess. Such bioevents are often used to refer to bioprocesses in text, which current techniques, relying solely on specialized lexicons, struggle to find. Results: This article presents a range of methods for finding bioprocess terms and events. To facilitate the study, we built a gold standard corpus in which terms and events related to angiogenesis, a key biological process of the growth of new blood vessels, were annotated. Statistics of the annotated corpus revealed that over 36% of the text expressions that referred to angiogenesis appeared as events. The proposed methods respectively employed domain-specific vocabularies, a manually annotated corpus and unstructured domain-specific documents. Evaluation results showed that, while a supervised machine-learning model yielded the best precision, recall and F1 scores, the other methods achieved reasonable performance and less cost to develop.
引用
收藏
页码:2730 / 2737
页数:8
相关论文
共 50 条
  • [31] Automatic Extraction of Polish Language Errors from Text Edition History
    Grundkiewicz, Roman
    TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 129 - 136
  • [32] Automatic extraction of gene/protein biological functions from biomedical text
    Koike, A
    Niwa, Y
    Takagi, T
    BIOINFORMATICS, 2005, 21 (07) : 1227 - 1236
  • [33] Automatic Summarization and Keyword Extraction from Web Page or Text File
    You, Xiangdong
    2019 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING TECHNOLOGY (CCET), 2019, : 154 - 158
  • [34] Automatic Text Generation via Text Extraction Based on Submodular
    Ai, Lisi
    Li, Na
    Zheng, Jianbing
    Gao, Ming
    WEB AND BIG DATA, 2017, 10612 : 237 - 246
  • [35] Automatic Feature Extraction and Text Recognition From Scanned Topographic Maps
    Pezeshk, Aria
    Tutwiler, Richard L.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2011, 49 (12): : 5047 - 5063
  • [36] Automatic extraction of the fine category of person named entities from text corpora
    Nguyen, Tri-Thanh
    Shimazu, Akira
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (10) : 1542 - 1549
  • [37] Vulcan: Automatic extraction and analysis of cyber threat intelligence from unstructured text
    Jo, Hyeonseong
    Lee, Yongjae
    Shin, Seungwon
    COMPUTERS & SECURITY, 2022, 120
  • [38] Towards Automatic Semantic Models by Extraction of Relevant Information from Online Text
    Krupp, Lars
    Gruenerbl, Agnes
    Bahle, Gernot
    Lukowicz, Paul
    2019 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2019), 2019, : 481 - 483
  • [39] Automatic text extraction from video for content-based annotation and retrieval
    Shim, JC
    Dorai, C
    Bolle, R
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 618 - 620
  • [40] Automatic Keyword Extraction from Bengali Text using Improved RAKE Approach
    Haque, Mozammel
    2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,