DISNET: a framework for extracting phenotypic disease information from public sources

被引:27
|
作者
Lagunes-Garcia, Gerardo [1 ]
Rodriguez-Gonzalez, Alejandro [1 ,2 ]
Prieto-Santamaria, Lucia [1 ]
Garcia del Valle, Eduardo P. [1 ]
Zanin, Massimiliano [1 ]
Menasalvas-Ruiz, Ernestina [1 ]
机构
[1] Univ Politecn Madrid, Ctr Tecnol Biomed, Madrid, Spain
[2] Univ Politecn Madrid, Escuela Tecn Super Ingn Informat, Madrid, Spain
来源
PEERJ | 2020年 / 8卷
关键词
Disnet framework; Natural language processing; Phenotypic information; Public sources; Disease understanding; NETWORK MEDICINE; BIOMEDICAL TEXT; WIKIPEDIA; RESOURCE; ONTOLOGY; DATABASE; GENES; INTEGRATION; ARTICLES; TOOL;
D O I
10.7717/peerj.8580
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background. Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on information from public sources is an interesting approach that allows one not only to complement and merge medical knowledge but also to increase it and thereby to interconnect existing data and analyse and relate diseases to each other. In this paper, we present DISNET (http://disnet.ctb.upm.es/), a web-based system designed to periodically extract the knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks. Methods. We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia and PubMed websites; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques. Results. We further present the validation of our system on Wikipedia and PubMed texts, obtaining the relevant accuracy. The final output indudes the creation of a comprehensive symptoms-disease dataset, shared (free access) through the system's API. We finally describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses. Discussion. DISNET allows retrieving knowledge about the signs, symptoms and diagnostic tests associated with a disease. It is not limited to a specific category (all the categories that the selected sources of information offer us) and clinical diagnosis terms. It further allows to track the evolution of those terms through time, being thus an opportunity to analyse and observe the progress of human knowledge on diseases. We further discussed the validation of the system, suggesting that it is good enough to be used to extract diseases and diagnostically-relevant terms. At the same time, the evaluation also revealed that improvements could be introduced to enhance the system's reliability.
引用
收藏
页数:34
相关论文
共 50 条
  • [21] An Ontological Framework for Information Extraction From Diverse Scientific Sources
    Zaman, Gohar
    Mahdin, Hairulnizam
    Hussain, Khalid
    Atta-Ur-Rahman
    Abawajy, Jemal
    Mostafa, Salama A.
    IEEE ACCESS, 2021, 9 : 42111 - 42124
  • [22] Extracting information from text
    Chai, JY
    Biermann, AW
    PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : 202 - 206
  • [23] Extracting information from graphics
    Gülgöz, S
    Yedekcioglu, ÖA
    PROCEEDINGS OF THE TWENTIETH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 1998, : 1224 - 1224
  • [24] Extracting energy from multiple sources
    Donaldson, Laurie
    MATERIALS TODAY, 2017, 20 (04) : 164 - 165
  • [25] A Framework for Extracting Scientific Measurements and Geo-Spatial Information from Scientific Literature
    Suryani, Muhammad Asif
    Woelker, Yannick
    Sharma, Deepak
    Beth, Christian
    Wallmann, Klaus
    Renz, Matthias
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON E-SCIENCE (ESCIENCE 2022), 2022, : 236 - 245
  • [26] GARBAN II: An integrative framework for extracting biological information from proteomic and genomic data
    Segura, Victoriano
    Podhorski, Adam
    Guruceaga, Elizabeth
    Sevilla, Jose L.
    Corrales, Fernando J.
    Rubio, Angel
    PROTEOMICS, 2006, 6 : S12 - S15
  • [27] Scalable and Adaptive Web Scraping Framework for Extracting Diverse Data from Open Internet Sources
    Reddy, N. Somanath
    Ruthvik, B.
    Maheshwari, M.
    Shabu, J. L. Jany
    Refonaa, J.
    2024 4TH INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2024, 2024, : 872 - 877
  • [28] Extracting Important Sentences for Public Health Surveillance Information from Indonesian Medical Articles
    Bhaskoro, Susetyo Bagas
    Akbar, Saiful
    Supangkat, Suhono Harso
    2017 INTERNATIONAL CONFERENCE ON ICT FOR SMART SOCIETY (ICISS), 2017,
  • [29] A scalable framework for the interoperation of information sources
    Mitra, P
    Wiederhold, G
    Decker, S
    EMERGING SEMANTIC WEB, 2002, 75 : 215 - 227
  • [30] A graph-based approach for extracting terminological properties from information sources with heterogeneous formats
    Palopoli, L
    Rosaci, D
    Terracina, G
    Ursino, D
    KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 8 (04) : 462 - 497