Flexible data integration and curation using a graph-based approach

被引:6
|
作者
Croset, Samuel [1 ]
Rupp, Joachim [1 ]
Romacker, Martin [1 ]
机构
[1] F Hoffmann La Roche & Cie AG, Roche Innovat Ctr Basel, CH-4070 Basel, Switzerland
关键词
D O I
10.1093/bioinformatics/btv644
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The increasing diversity of data available to the biomedical scientist holds promise for better understanding of diseases and discovery of new treatments for patients. In order to provide a complete picture of a biomedical question, data from many different origins needs to be combined into a unified representation. During this data integration process, inevitable errors and ambiguities present in the initial sources compromise the quality of the resulting data warehouse, and greatly diminish the scientific value of the content. Expensive and time-consuming manual curation is then required to improve the quality of the information. However, it becomes increasingly difficult to dedicate and optimize the resources for data integration projects as available repositories are growing both in size and in number everyday. Results: We present a new generic methodology to identify problematic records, causing what we describe as 'data hairball' structures. The approach is graph-based and relies on two metrics traditionally used in social sciences: the graph density and the betweenness centrality. We evaluate and discuss these measures and show their relevance for flexible, optimized and automated data curation and linkage. The methodology focuses on information coherence and correctness to improve the scientific meaningfulness of data integration endeavors, such as knowledge bases and large data warehouses.
引用
收藏
页码:918 / 925
页数:8
相关论文
共 50 条
  • [31] Graph-Based Integration of Histone Modification Profiles
    Baccini, Federica
    Bianchini, Monica
    Geraci, Filippo
    MATHEMATICS, 2022, 10 (11)
  • [32] Knowledge graph-based data integration system for digital twins of built assets
    Ramonell, Carlos
    Chacon, Rolando
    Posada, Hector
    AUTOMATION IN CONSTRUCTION, 2023, 156
  • [33] Graph-Based Clustering Approach for Economic and Financial Event Detection Using News Analytics Data
    Sidorov, Sergei P.
    Faizliev, Alexey R.
    Levshunov, Michael
    Chekmareva, Alfia
    Gudkov, Alexander
    Korobov, Eugene
    SOCIAL INFORMATICS (SOCINFO 2018), PT II, 2018, 11186 : 271 - 280
  • [34] Linking Life Sciences Data Using Graph-Based Mapping
    Taubert, Jan
    Hindle, Matthew
    Lysenko, Artem
    Weile, Jochen
    Koelher, Jacob
    Rawlings, Christopher J.
    DATA INTEGRATION IN THE LIFE SCIENCES, PROCEEDINGS, 2009, 5647 : 16 - +
  • [35] Summarizing video sequence using a graph-based hierarchical approach
    Belo, Luciana dos Santos
    Caetano, Carlos Antonio, Jr.
    do Patrocinio, Zenilton Kleber Goncalves, Jr.
    Ferzoli Guimaraes, Silvio Jamil
    NEUROCOMPUTING, 2016, 173 : 1001 - 1016
  • [36] Deployment of Visual Sensor Networks Using a Graph-Based Approach
    Alarcon-Herrera, Jose Luis
    Chen, Xiang
    2014 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2014, : 2575 - 2578
  • [37] Optimizing large join queries using a graph-based approach
    Lee, C
    Shih, CS
    Chen, YH
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2001, 13 (02) : 298 - 315
  • [38] Botnet Detection Approach Using Graph-Based Machine Learning
    Alharbi, Afnan
    Alsubhi, Khalid
    IEEE ACCESS, 2021, 9 (09): : 99166 - 99180
  • [39] A Graph-Based Feature Location Approach Using Set Theory
    Mueller, Richard
    Eisenecker, Ulrich
    SPLC'19: PROCEEDINGS OF THE 23RD INTERNATIONAL SYSTEMS AND SOFTWARE PRODUCT LINE CONFERENCE, VOL A, 2020, : 88 - 92
  • [40] An Efficient Mining of Transactional Data Using Graph-based Technique
    AlZoubi, Wael Ahmad
    Omar, Khairuddin
    Abu Bakar, Azuraliza
    2011 3RD CONFERENCE ON DATA MINING AND OPTIMIZATION (DMO), 2011, : 74 - 81