New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads

被引:3
|
作者
Gonzalez-Garcia, Laura [1 ]
Guevara-Barrientos, David [1 ]
Lozano-Arce, Daniela [1 ]
Gil, Juanita [2 ]
Diaz-Riano, Jorge [1 ]
Duarte, Erick [1 ]
Andrade, German [1 ]
Camilo Bojaca, Juan [1 ]
Camila Hoyos-Sanchez, Maria [1 ]
Chavarro, Christian [1 ]
Guayazan, Natalia [3 ]
Chica, Luis Alberto [4 ,5 ]
Buitrago Acosta, Maria Camila [1 ]
Bautista, Edwin [1 ]
Trujillo, Miller [1 ]
Duitama, Jorge [1 ]
机构
[1] Univ Andes, Syst & Comp Engn Dept, Bogota, Colombia
[2] Univ Arkansas, Dept Entomol & Plant Pathol, Fayetteville, AR USA
[3] Univ Andes, Dept Biol Sci, Bogota, Colombia
[4] Univ Andes, Dept Biol Sci, Res Grp Computat Biol & Microbial Ecol, Bogota, Colombia
[5] Univ Andes, Max Planck Tandem Grp Computat Biol, Bogota, Colombia
关键词
SINGLE; ANNOTATION;
D O I
10.26508/lsa.202201719
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Building de novo genome assemblies for complex genomes is possible thanks to long-read DNA sequencing technologies. However, maximizing the quality of assemblies based on long reads is a challenging task that requires the development of specialized data analysis techniques. We present new algorithms for assembling long DNA sequencing reads from haploid and diploid organisms. The assembly algorithm builds an undirected graph with two vertices for each read based on minimizers se-lected by a hash function derived from the k-mer distribution. Statistics collected during the graph construction are used as features to build layout paths by selecting edges, ranked by a likelihood function. For diploid samples, we integrated a reim-plementation of the ReFHap algorithm to perform molecular phasing. We ran the implemented algorithms on PacBio HiFi and Nanopore sequencing data taken from haploid and diploid samples of different species. Our algorithms showed competitive accuracy and computational efficiency, compared with other currently used software. We expect that this new development will be useful for researchers building genome assemblies for different species.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Nanopore sequencing and assembly of a human genome with ultra-long reads
    Jain, Miten
    Koren, Sergey
    Miga, Karen H.
    Quick, Josh
    Rand, Arthur C.
    Sasani, Thomas A.
    Tyson, John R.
    Beggs, Andrew D.
    Dilthey, Alexander T.
    Fiddes, Ian T.
    Malla, Sunir
    Marriott, Hannah
    Nieto, Tom
    O'Grady, Justin
    Olsen, Hugh E.
    Pedersen, Brent S.
    Rhie, Arang
    Richardson, Hollian
    Quinlan, Aaron R.
    Snutch, Terrance P.
    Tee, Louise
    Paten, Benedict
    Phillippy, Adam M.
    Simpson, Jared T.
    Loman, Nicholas J.
    Loose, Matthew
    NATURE BIOTECHNOLOGY, 2018, 36 (04) : 338 - +
  • [42] I-CONVEX: Fast and Accurate de Novo Transcriptome Recovery from Long Reads
    Baharlouei, Sina
    Razaviyayn, Meisam
    Tseng, Elizabeth
    Tse, David
    MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II, 2023, 1753 : 339 - 363
  • [43] Current challenges in de novo plant genome sequencing and assembly
    Michael C Schatz
    Jan Witkowski
    W Richard McCombie
    Genome Biology, 13
  • [44] Current challenges in de novo plant genome sequencing and assembly
    Schatz, Michael C.
    Witkowski, Jan
    McCombie, W. Richard
    GENOME BIOLOGY, 2012, 13 (04):
  • [45] De novo genome assembly for third generation sequencing data
    Forc, Mateusz
    Kusmirek, Wiktor
    Nowak, Robert M.
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2018, 2018, 10808
  • [46] Next generation sequencing under de novo genome assembly
    Nimmy, Sonia Farhana
    Kamal, M. S.
    INTERNATIONAL JOURNAL OF BIOMATHEMATICS, 2015, 8 (05)
  • [47] Efficient data structures for mobile de novo genome assembly by third-generation sequencing
    Milicchio, Franco
    Prosperi, Mattia
    14TH INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS AND PERVASIVE COMPUTING (MOBISPC 2017) / 12TH INTERNATIONAL CONFERENCE ON FUTURE NETWORKS AND COMMUNICATIONS (FNC 2017) / AFFILIATED WORKSHOPS, 2017, 110 : 440 - 447
  • [48] De novo assembly of a new Olea europaea genome accession using nanopore sequencing
    Rao, Guodong
    Zhang, Jianguo
    Liu, Xiaoxia
    Lin, Chunfu
    Xin, Huaigen
    Xue, Li
    Wang, Chenhe
    HORTICULTURE RESEARCH, 2021, 8 (01)
  • [49] phasebook: haplotype-aware de novo assembly of diploid genomes from long reads
    Xiao Luo
    Xiongbin Kang
    Alexander Schönhuth
    Genome Biology, 22
  • [50] Hybrid error correction and de novo assembly of single-molecule sequencing reads
    Sergey Koren
    Michael C Schatz
    Brian P Walenz
    Jeffrey Martin
    Jason T Howard
    Ganeshkumar Ganapathy
    Zhong Wang
    David A Rasko
    W Richard McCombie
    Erich D Jarvis
    Adam M Phillippy
    Nature Biotechnology, 2012, 30 : 693 - 700