ReUseData: an R/Bioconductor tool for reusable and reproducible genomic data management

被引:0
|
作者
Liu, Qian [1 ]
Hu, Qiang [1 ]
Liu, Song [1 ]
Hutson, Alan [1 ]
Morgan, Martin [1 ]
机构
[1] Roswell Pk Comprehens Canc Ctr, Dept Biostat & Bioinformat, Buffalo, NY 14263 USA
关键词
Genomic data; Data reusability; Data reproducibility; Data management; Common Workflow Language;
D O I
10.1186/s12859-023-05626-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The increasing volume and complexity of genomic data pose significant challenges for effective data management and reuse. Public genomic data often undergo similar preprocessing across projects, leading to redundant or inconsistent datasets and inefficient use of computing resources. This is especially pertinent for bioinformaticians engaged in multiple projects. Tools have been created to address challenges in managing and accessing curated genomic datasets, however, the practical utility of such tools becomes especially beneficial for users who seek to work with specific types of data or are technically inclined toward a particular programming language. Currently, there exists a gap in the availability of an R-specific solution for efficient data management and versatile data reuse.Results: Here we present ReUseData, an R software tool that overcomes some of the limitations of existing solutions and provides a versatile and reproducible approach to effective data management within R. ReUseData facilitates the transformation of ad hoc scripts for data preprocessing into Common Workflow Language (CWL)-based data recipes, allowing for the reproducible generation of curated data files in their generic formats. The data recipes are standardized and self-contained, enabling them to be easily portable and reproducible across various computing platforms. ReUseData also streamlines the reuse of curated data files and their integration into downstream analysis tools and workflows with different frameworks.Conclusions: ReUseData provides a reliable and reproducible approach for genomic data management within the R environment to enhance the accessibility and reusability of genomic data. The package is available at Bioconductor (https://bioconductor.org/packages/ReUseData/) with additional information on the project website (https://rcwl.org/dataRecipes/).
引用
收藏
页数:9
相关论文
共 50 条
  • [1] ReUseData: an R/Bioconductor tool for reusable and reproducible genomic data management
    Qian Liu
    Qiang Hu
    Song Liu
    Alan Hutson
    Martin Morgan
    BMC Bioinformatics, 25
  • [2] HilbertCurve: an R/Bioconductor package for high-resolution visualization of genomic data
    Gu, Zuguang
    Eils, Roland
    Schlesner, Matthias
    BIOINFORMATICS, 2016, 32 (15) : 2372 - 2374
  • [3] rGREAT: an R/bioconductor package for functional enrichment on genomic regions
    Gu, Zuguang
    Huebschmann, Daniel
    BIOINFORMATICS, 2023, 39 (01)
  • [4] Visualization of proteomics data using R and Bioconductor
    Gatto, Laurent
    Breckels, Lisa M.
    Naake, Thomas
    Gibb, Sebastian
    PROTEOMICS, 2015, 15 (08) : 1375 - 1389
  • [5] Using R and Bioconductor for proteomics data analysis
    Gatto, Laurent
    Christoforou, Andy
    BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2014, 1844 (01): : 42 - 51
  • [6] EnrichedHeatmap: an R/Bioconductor package for comprehensive visualization of genomic signal associations
    Gu, Zuguang
    Eils, Roland
    Schlesner, Matthias
    Ishaque, Naveed
    BMC GENOMICS, 2018, 19
  • [7] Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt
    Steffen Durinck
    Paul T Spellman
    Ewan Birney
    Wolfgang Huber
    Nature Protocols, 2009, 4 : 1184 - 1191
  • [8] EnrichedHeatmap: an R/Bioconductor package for comprehensive visualization of genomic signal associations
    Zuguang Gu
    Roland Eils
    Matthias Schlesner
    Naveed Ishaque
    BMC Genomics, 19
  • [9] Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt
    Durinck, Steffen
    Spellman, Paul T.
    Birney, Ewan
    Huber, Wolfgang
    NATURE PROTOCOLS, 2009, 4 (08) : 1184 - 1191
  • [10] Reproducible probe-level analysis of the Affymetrix Exon 1.0 ST array with R/Bioconductor
    Rodrigo-Domingo, Maria
    Waagepetersen, Rasmus
    Bodker, Julie Stove
    Falgreen, Steffen
    Kjeldsen, Malene Krag
    Johnsen, Hans Erik
    Dybkaer, Karen
    Bogsted, Martin
    BRIEFINGS IN BIOINFORMATICS, 2014, 15 (04) : 519 - 533