New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads

被引:3
|
作者
Gonzalez-Garcia, Laura [1 ]
Guevara-Barrientos, David [1 ]
Lozano-Arce, Daniela [1 ]
Gil, Juanita [2 ]
Diaz-Riano, Jorge [1 ]
Duarte, Erick [1 ]
Andrade, German [1 ]
Camilo Bojaca, Juan [1 ]
Camila Hoyos-Sanchez, Maria [1 ]
Chavarro, Christian [1 ]
Guayazan, Natalia [3 ]
Chica, Luis Alberto [4 ,5 ]
Buitrago Acosta, Maria Camila [1 ]
Bautista, Edwin [1 ]
Trujillo, Miller [1 ]
Duitama, Jorge [1 ]
机构
[1] Univ Andes, Syst & Comp Engn Dept, Bogota, Colombia
[2] Univ Arkansas, Dept Entomol & Plant Pathol, Fayetteville, AR USA
[3] Univ Andes, Dept Biol Sci, Bogota, Colombia
[4] Univ Andes, Dept Biol Sci, Res Grp Computat Biol & Microbial Ecol, Bogota, Colombia
[5] Univ Andes, Max Planck Tandem Grp Computat Biol, Bogota, Colombia
关键词
SINGLE; ANNOTATION;
D O I
10.26508/lsa.202201719
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Building de novo genome assemblies for complex genomes is possible thanks to long-read DNA sequencing technologies. However, maximizing the quality of assemblies based on long reads is a challenging task that requires the development of specialized data analysis techniques. We present new algorithms for assembling long DNA sequencing reads from haploid and diploid organisms. The assembly algorithm builds an undirected graph with two vertices for each read based on minimizers se-lected by a hash function derived from the k-mer distribution. Statistics collected during the graph construction are used as features to build layout paths by selecting edges, ranked by a likelihood function. For diploid samples, we integrated a reim-plementation of the ReFHap algorithm to perform molecular phasing. We ran the implemented algorithms on PacBio HiFi and Nanopore sequencing data taken from haploid and diploid samples of different species. Our algorithms showed competitive accuracy and computational efficiency, compared with other currently used software. We expect that this new development will be useful for researchers building genome assemblies for different species.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Fast and accurate de novo genome assembly from long uncorrected reads
    Vaser, Robert
    Sovic, Ivan
    Nagarajan, Niranjan
    Sikic, Mile
    GENOME RESEARCH, 2017, 27 (05) : 737 - 746
  • [2] Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly
    Wang, Ou
    Chin, Robert
    Cheng, Xiaofang
    Wu, Michelle Ka Yan
    Mao, Qing
    Tang, Jingbo
    Sun, Yuhui
    Anderson, Ellis
    Lam, Han K.
    Chen, Dan
    Zhou, Yujun
    Wang, Linying
    Fan, Fei
    Zou, Yan
    Xie, Yinlong
    Zhang, Rebecca Yu
    Drmanac, Snezana
    Nguyen, Darlene
    Xu, Chongjun
    Villarosa, Christian
    Gablenz, Scott
    Barua, Nina
    Nguyen, Staci
    Tian, Wenlan
    Liu, Jia Sophie
    Wang, Jingwan
    Liu, Xiao
    Qi, Xiaojuan
    Chen, Ao
    Wang, He
    Dong, Yuliang
    Zhang, Wenwei
    Alexeev, Andrei
    Yang, Huanming
    Wang, Jian
    Kristiansen, Karsten
    Xu, Xun
    Drmanac, Radoje
    Peters, Brock A.
    GENOME RESEARCH, 2019, 29 (05) : 798 - 808
  • [3] Rapid de novo assembly of the European eel genome from nanopore sequencing reads
    Hans J. Jansen
    Michael Liem
    Susanne A. Jong-Raadsen
    Sylvie Dufour
    Finn-Arne Weltzien
    William Swinkels
    Alex Koelewijn
    Arjan P. Palstra
    Bernd Pelster
    Herman P. Spaink
    Guido E. van den Thillart
    Ron P. Dirks
    Christiaan V. Henkel
    Scientific Reports, 7
  • [4] Rapid de novo assembly of the European eel genome from nanopore sequencing reads
    Jansen, Hans J.
    Liem, Michael
    Jong-Raadsen, Susanne A.
    Dufour, Sylvie
    Weltzien, Finn-Arne
    Swinkels, William
    Koelewijn, Alex
    Palstra, Arjan P.
    Pelster, Bernd
    Spaink, Herman P.
    van den Thillart, Guido E.
    Dirks, Ron P.
    Henkel, Christiaan V.
    SCIENTIFIC REPORTS, 2017, 7
  • [5] De novo diploid genome assembly using long noisy reads
    Nie, Fan
    Ni, Peng
    Huang, Neng
    Zhang, Jun
    Wang, Zhenyu
    Xiao, Chuanle
    Luo, Feng
    Wang, Jianxin
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [6] De novo diploid genome assembly using long noisy reads
    Fan Nie
    Peng Ni
    Neng Huang
    Jun Zhang
    Zhenyu Wang
    Chuanle Xiao
    Feng Luo
    Jianxin Wang
    Nature Communications, 15
  • [7] Efficient Hybrid De Novo Error Correction and Assembly for Long Reads
    Kchouk, Mehdi
    Elloumi, Mourad
    2016 27TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2016, : 88 - 92
  • [8] Author Correction: Rapid de novo assembly of the European eel genome from nanopore sequencing reads
    Hans J. Jansen
    Michael Liem
    Susanne A. Jong-Raadsen
    Sylvie Dufour
    Finn-Arne Weltzien
    William Swinkels
    Alex Koelewijn
    Arjan P. Palstra
    Bernd Pelster
    Herman P. Spaink
    Guido E. van den Thillart
    Ron P. Dirks
    Christiaan V. Henkel
    Scientific Reports, 9
  • [9] De novo genome assembly and annotation of Holothuria scabra (Jaeger, 1833) from nanopore sequencing reads
    Luo, Honglin
    Huang, Guanghua
    Li, Jianbin
    Yang, Qiong
    Zhu, Jiajie
    Zhang, Bin
    Feng, Pengfei
    Zhang, Yongde
    Yang, Xueming
    GENES & GENOMICS, 2022, 44 (12) : 1487 - 1498
  • [10] De novo genome assembly and annotation of Holothuria scabra (Jaeger, 1833) from nanopore sequencing reads
    Honglin Luo
    Guanghua Huang
    Jianbin Li
    Qiong Yang
    Jiajie Zhu
    Bin Zhang
    Pengfei Feng
    Yongde Zhang
    Xueming Yang
    Genes & Genomics, 2022, 44 : 1487 - 1498