A High-Performance Database Management System for Managing and Analyzing Large-Scale SNP Data in Plant Genotyping and Breeding Applications

被引:2
|
作者
Zhao, Yikun [1 ]
Jiang, Bin [1 ]
Huo, Yongxue [1 ]
Yi, Hongmei [1 ]
Tian, Hongli [1 ]
Wu, Haotian [1 ]
Wang, Rui [1 ]
Zhao, Jiuran [1 ]
Wang, Fengge [1 ]
机构
[1] Beijing Acad Agr & Forest Sci BAAFS, Maize Res Ctr, Beijing Key Lab Maize DNA Fingerprinting & Mol Br, Beijing 100097, Peoples R China
来源
AGRICULTURE-BASEL | 2021年 / 11卷 / 11期
基金
国家重点研发计划;
关键词
SNP; SNP array; KASP; database; DNA fingerprint; algorithms; genotyping; DNA; DIVERSITY; SEQUENCE; BARCODE;
D O I
10.3390/agriculture11111027
中图分类号
S3 [农学(农艺学)];
学科分类号
0901 ;
摘要
A DNA fingerprint database is an efficient, stable, and automated tool for plant molecular research that can provide comprehensive technical support for multiple fields of study, such as pan-genome analysis and crop breeding. However, constructing a DNA fingerprint database for plants requires significant resources for data output, storage, analysis, and quality control. Large amounts of heterogeneous data must be processed efficiently and accurately. Thus, we developed plant SNP database management system (PSNPdms) using an open-source web server and free software that is compatible with single nucleotide polymorphism (SNP), insertion-deletion (InDel) markers, Kompetitive Allele Specific PCR (KASP), SNP array platforms, and 23 species. It fully integrates with the KASP platform and allows for graphical presentation and modification of KASP data. The system has a simple, efficient, and versatile laboratory personnel management structure that adapts to complex and changing experimental needs with a simple workflow process. PSNPdms internally provides effective support for data quality control through multiple dimensions, such as the standardized experimental design, standard reference samples, fingerprint statistical selection algorithm, and raw data correlation queries. In addition, we developed a fingerprint-merging algorithm to solve the problem of merging fingerprints of mixed samples and single samples in plant detection, providing unique standard fingerprints of each plant species for construction of a standard DNA fingerprint database. Different laboratories can use the system to generate fingerprint packages for data interaction and sharing. In addition, we integrated genetic analysis into the system to enable drawing and downloading of dendrograms. PSNPdms has been widely used by 23 institutions and has proven to be a stable and effective system for sharing data and performing genetic analysis. Interested researchers are required to adapt and further develop the system.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] A Large-Scale Study of Failures in High-Performance Computing Systems
    Schroeder, Bianca
    Gibson, Garth A.
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2010, 7 (04) : 337 - 350
  • [42] Large-scale linear regression: Development of high-performance routines
    Frank, Alvaro
    Fabregat-Traver, Diego
    Bientinesi, Paolo
    APPLIED MATHEMATICS AND COMPUTATION, 2016, 275 : 411 - 421
  • [43] High-performance computing for large-scale analysis, optimization, and control
    Adeli, H
    JOURNAL OF AEROSPACE ENGINEERING, 2000, 13 (01) : 1 - 10
  • [44] STRATEGIES FOR LARGE-SCALE STRUCTURAL PROBLEMS ON HIGH-PERFORMANCE COMPUTERS
    NOOR, AK
    PETERS, JM
    COMMUNICATIONS IN APPLIED NUMERICAL METHODS, 1991, 7 (06): : 465 - 478
  • [45] Application of High-Resolution Melting to Large-Scale, High-Throughput SNP Genotyping: A Comparison with the TaqMan® Method
    Martino, Alessandro
    Mancuso, Tommaso
    Rossi, Anna Maria
    JOURNAL OF BIOMOLECULAR SCREENING, 2010, 15 (06) : 623 - 629
  • [46] A distributed data management system to support large-scale data analysis
    Emara, Tamer Z.
    Huang, Joshua Zhexue
    JOURNAL OF SYSTEMS AND SOFTWARE, 2019, 148 : 105 - 115
  • [47] PHash: A memory-efficient, high-performance key-value store for large-scale data-intensive applications
    Shim, Hyotaek
    JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 123 : 33 - 44
  • [48] Large-Scale Data Storage and Management Scheme Based on Distributed Database Systems
    Sun, Qiao
    Deng, Bu-qiao
    Fu, Lan-mei
    Wang, Zhi-qiang
    Pei, Xu-bin
    Sun, Jia-Song
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND INTELLIGENT MANUFACTURING (ITIM 2017), 2017, 142 : 14 - 17
  • [49] Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist
    Gittens, Alex
    Rothauge, Kai
    Wang, Shusen
    Mahoney, Michael W.
    Gerhardt, Lisa
    Prabhat
    Kottalam, Jey
    Ringenburg, Michael
    Maschhoff, Kristyn
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 293 - 301
  • [50] The Chado Natural Diversity module: a new generic database schema for large-scale phenotyping and genotyping data
    Jung, Sook
    Menda, Naama
    Redmond, Seth
    Buels, Robert M.
    Friesen, Maren
    Bendana, Yuri
    Sanderson, Lacey-Anne
    Lapp, Hilmar
    Lee, Taein
    MacCallum, Bob
    Bett, Kirstin E.
    Cain, Scott
    Clements, Dave
    Mueller, Lukas A.
    Main, Dorrie
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2011,