New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads

被引:3
|
作者
Gonzalez-Garcia, Laura [1 ]
Guevara-Barrientos, David [1 ]
Lozano-Arce, Daniela [1 ]
Gil, Juanita [2 ]
Diaz-Riano, Jorge [1 ]
Duarte, Erick [1 ]
Andrade, German [1 ]
Camilo Bojaca, Juan [1 ]
Camila Hoyos-Sanchez, Maria [1 ]
Chavarro, Christian [1 ]
Guayazan, Natalia [3 ]
Chica, Luis Alberto [4 ,5 ]
Buitrago Acosta, Maria Camila [1 ]
Bautista, Edwin [1 ]
Trujillo, Miller [1 ]
Duitama, Jorge [1 ]
机构
[1] Univ Andes, Syst & Comp Engn Dept, Bogota, Colombia
[2] Univ Arkansas, Dept Entomol & Plant Pathol, Fayetteville, AR USA
[3] Univ Andes, Dept Biol Sci, Bogota, Colombia
[4] Univ Andes, Dept Biol Sci, Res Grp Computat Biol & Microbial Ecol, Bogota, Colombia
[5] Univ Andes, Max Planck Tandem Grp Computat Biol, Bogota, Colombia
关键词
SINGLE; ANNOTATION;
D O I
10.26508/lsa.202201719
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Building de novo genome assemblies for complex genomes is possible thanks to long-read DNA sequencing technologies. However, maximizing the quality of assemblies based on long reads is a challenging task that requires the development of specialized data analysis techniques. We present new algorithms for assembling long DNA sequencing reads from haploid and diploid organisms. The assembly algorithm builds an undirected graph with two vertices for each read based on minimizers se-lected by a hash function derived from the k-mer distribution. Statistics collected during the graph construction are used as features to build layout paths by selecting edges, ranked by a likelihood function. For diploid samples, we integrated a reim-plementation of the ReFHap algorithm to perform molecular phasing. We ran the implemented algorithms on PacBio HiFi and Nanopore sequencing data taken from haploid and diploid samples of different species. Our algorithms showed competitive accuracy and computational efficiency, compared with other currently used software. We expect that this new development will be useful for researchers building genome assemblies for different species.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads
    Al-okaily, Anas A.
    BMC GENOMICS, 2016, 17
  • [32] Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data
    Al-Nakeeb, Kosai
    Petersen, Thomas Nordahl
    Sicheritz-Ponten, Thomas
    BMC BIOINFORMATICS, 2017, 18
  • [33] Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data
    Kosai Al-Nakeeb
    Thomas Nordahl Petersen
    Thomas Sicheritz-Pontén
    BMC Bioinformatics, 18
  • [34] Rapid, robust plasmid verification by de novo assembly of short sequencing reads
    Gallegos, Jenna E.
    Rogers, Mark F.
    Cialek, Charlotte A.
    Peccoud, Jean
    NUCLEIC ACIDS RESEARCH, 2020, 48 (18) : E106 - E106
  • [35] Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly
    Gavrielatos, Marios
    Kyriakidis, Konstantinos
    Spandidos, Demetrios A.
    Michalopoulos, Ioannis
    MOLECULAR MEDICINE REPORTS, 2021, 23 (04)
  • [36] Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results
    Haiminen, Niina
    Kuhn, David N.
    Parida, Laxmi
    Rigoutsos, Isidore
    PLOS ONE, 2011, 6 (09):
  • [37] Improved hybrid de novo genome assembly and annotation of African wild rice, Oryza longistaminata, from Illumina and PacBio sequencing reads
    Li, Wei
    Li, Kui
    Zhang, Qun-jie
    Zhu, Ting
    Zhang, Yun
    Shi, Cong
    Liu, Yun-long
    Xia, En-hua
    Jiang, Jian-jun
    Shi, Chao
    Zhang, Li-ping
    Huang, Hui
    Tong, Yan
    Liu, Yuan
    Zhang, Dan
    Zhao, Yuan
    Jiang, Wen-kai
    Zhao, You-jie
    Mao, Shu-yan
    Jiao, Jun-ying
    Xu, Ping-zhen
    Yang, Li-li
    Yin, Guo-ying
    Gao, Li-zhi
    PLANT GENOME, 2020, 13 (01):
  • [38] Meraculous: De Novo Genome Assembly with Short Paired-End Reads
    Chapman, Jarrod A.
    Ho, Isaac
    Sunkara, Sirisha
    Luo, Shujun
    Schroth, Gary P.
    Rokhsar, Daniel S.
    PLOS ONE, 2011, 6 (08):
  • [39] Linking De Novo Assembly Results with Long DNA Reads Using the dnaasm-link Application
    Kusmirek, Wiktor
    Franus, Wiktor
    Nowak, Robert
    BIOMED RESEARCH INTERNATIONAL, 2019, 2019
  • [40] Nanopore sequencing and assembly of a human genome with ultra-long reads
    Miten Jain
    Sergey Koren
    Karen H Miga
    Josh Quick
    Arthur C Rand
    Thomas A Sasani
    John R Tyson
    Andrew D Beggs
    Alexander T Dilthey
    Ian T Fiddes
    Sunir Malla
    Hannah Marriott
    Tom Nieto
    Justin O'Grady
    Hugh E Olsen
    Brent S Pedersen
    Arang Rhie
    Hollian Richardson
    Aaron R Quinlan
    Terrance P Snutch
    Louise Tee
    Benedict Paten
    Adam M Phillippy
    Jared T Simpson
    Nicholas J Loman
    Matthew Loose
    Nature Biotechnology, 2018, 36 : 338 - 345