Data Science for Genomic Data Management: Challenges, Resources, Experiences

被引:0
|
作者
Ceri S. [1 ]
Pinoli P. [1 ]
机构
[1] Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, Milan
基金
欧盟地平线“2020”; 欧洲研究理事会;
关键词
Data driven genomic computing; Data scarcity; Genomic data science; Genomic datasets;
D O I
10.1007/s42979-019-0005-0
中图分类号
学科分类号
摘要
We highlight several challenges which are faced by data scientists who use public datasets for solving biological and clinical problems. In spite of the large efforts in building such public datasets, they are dispersed over many sources and heterogeneous for their formats and sequencing/calling techniques, often meeting highly variable quality standards. Moreover, for most research questions, scientists hardly find datasets with enough samples for building and training machine learning models. Data scarcity depends on the complexity of the genomic domain, with its multi-facets, as well as the lack of organic initiatives to provide standardization across communities. In this paper, we discuss our approach to genomic data management, that can strongly improve the problems of data dispersion and format heterogeneity through high-level abstractions for genomics. We briefly present the computational resources that were recently developed by the GeCo project (ERC Advanced Grant); they include GDM, a Genomic Data Model providing interoperability across data formats; GMQL, a genometric query language for answering data science queries over genomic datasets; and an in-house integrated repository providing attribute-based and keyword-based search over normalized metadata from several open data repositories. We describe these resources at work on a specific research question, and we highlight how we managed to produce a model for addressing such specific research question by overcoming the lack of sufficient samples and labelled datasets. © 2019, Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [31] Mobility Data Science: Perspectives and Challenges
    Mokbel, Mohamed
    Sakr, Mahmoud
    Xiong, Li
    Zufle, Andreas
    Almeida, Jussara
    Anderson, Taylor
    Aref, Walid
    Andrienko, Gennady
    Andrienko, Natalia
    Cao, Yang
    Chawla, Sanjay
    Cheng, Reynold
    Chrysanthis, Panos
    Fei, Xiqi
    Ghinita, Gabriel
    Graser, Anita
    Gunopulos, Dimitrios
    Jensen, Christian S.
    Kim, Joon-Seok
    Kim, Kyoung-Sook
    Kroger, Peer
    Krumm, John
    Lauer, Johannes
    Magdy, Amr
    Nascimento, Mario
    Ravada, Siva
    Renz, Matthias
    ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2024, 10 (02)
  • [32] Big Data Challenges in Big Science
    Andreas Heiss
    Computing and Software for Big Science, 2019, 3 (1)
  • [33] Codification Challenges for Data Science in Construction
    Soman, Ranjith K.
    Whyte, Jennifer K.
    JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2020, 146 (07)
  • [34] BIG DATA CHALLENGES IN CHINA CENTRE FOR RESOURCES SATELLITE DATA AND APPLICATION
    Shao, Jun
    Xu, Daqi
    Feng, Chun
    Chi, Mingmin
    2015 7TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2015,
  • [35] A data science approach to mitigating data challenges in serious gaming
    Germain Abdul-Rahman
    Noman Haleem
    Andrej Zwitter
    Discover Data, 3 (1):
  • [36] Challenges in ubiquitous data management
    Franklin, MJ
    INFORMATICS - 10 YEARS BACK, 10 YEARS AHEAD, 2001, 2000 : 24 - 33
  • [37] Overcoming data management challenges
    Meikle, C
    Elands, J
    GENETIC ENGINEERING NEWS, 2002, 22 (20): : 32 - +
  • [38] Data Management Challenges in Gaia
    Hernandez, Jose
    Hutton, Alexander
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS: XXIV, 2015, 495 : 47 - 51
  • [39] Development and Challenges for longevity in big data resources
    Edgar, Heather J. H.
    Berry, Shamsi Daneshvari
    AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY, 2023, 180 : 48 - 48
  • [40] Ethical governance for genomic data science in the cloud
    Rahimzadeh, Vasiliki
    Nelson, Sarah C.
    Thorogood, Adrian
    Lawson, Jonathan
    Fullerton, Stephanie M.
    NATURE REVIEWS GENETICS, 2025, 26 (02) : 75 - 77