Assembly-free discovery of human novel sequences using long reads

被引:1
|
作者
Li, Qiuhui [1 ]
Yan, Bin [1 ]
Lam, Tak-Wah [1 ]
Luo, Ruibang [1 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
关键词
long reads; novel sequences; assembly-free approach; human references; STRUCTURAL VARIATION; HUMAN GENOME;
D O I
10.1093/dnares/dsac039
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
DNA sequences that are absent in the human reference genome are classified as novel sequences. The discovery of these missed sequences is crucial for exploring the genomic diversity of populations and understanding the genetic basis of human diseases. However, various DNA lengths of reads generated from different sequencing technologies can significantly affect the results of novel sequences. In this work, we designed an assembly-free novel sequence (AF-NS) approach to identify novel sequences from Oxford Nanopore Technology long reads. Among the newly detected sequences using AF-NS, more than 95% were omitted from those using long-read assemblers and 85% were not present in short reads of Illumina. We identified the common novel sequences among all the samples and revealed their association with the binding motifs of transcription factors. Regarding the placements of the novel sequences, we found about 70% enriched in repeat regions and generated 430 for one specific subpopulation that might be related to their evolution. Our study demonstrates the advance of the assembly-free approach to capture more novel sequences over other assembler based methods. Combining the long-read data with powerful analytical methods can be a robust way to improve the completeness of novel sequences.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Mapping-Free and Assembly-Free Discovery of Inversion Breakpoints from Raw NGS Reads
    Lemaitre, Claire
    Ciortuz, Liviu
    Peterlongo, Pierre
    ALGORITHMS FOR COMPUTATIONAL BIOLOGY, 2014, 8542 : 119 - 130
  • [2] Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns
    Comin, Matteo
    Schimd, Michele
    BMC BIOINFORMATICS, 2014, 15
  • [3] A scalable assembly-free variable selection algorithm for biomarker discovery from metagenomes
    Anestis Gkanogiannis
    Stéphane Gazut
    Marcel Salanoubat
    Sawsan Kanj
    Thomas Brüls
    BMC Bioinformatics, 17
  • [4] A scalable assembly-free variable selection algorithm for biomarker discovery from metagenomes
    Gkanogiannis, Anestis
    Gazut, Stephane
    Salanoubat, Marcel
    Kanj, Sawsan
    Bruls, Thomas
    BMC BIOINFORMATICS, 2016, 17
  • [5] Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns
    Matteo Comin
    Michele Schimd
    BMC Bioinformatics, 15
  • [6] Assembly-Free and Alignment-Free Sample Identification Using Genome Skims
    Sarmashghi, Shahab
    Bohmann, Kristine
    Gilbert, M. Thomas P.
    Bafna, Vineet
    Mirarab, Siavash
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2018, 2018, 10812 : 276 - 277
  • [7] Assembly-free reads accurate identification (AFRAID) approach outperforms other methods of DNA barcoding in the walnut family (Juglandaceae)
    Liu, Yanlei
    Chen, Kai
    Wang, Lihu
    Yu, Xinqiang
    Xu, Chao
    Suo, Zhili
    Zhou, Shiliang
    Shi, Shuo
    Dong, Wenpan
    PLANT DIVERSITY, 2025, 47 (01) : 115 - 126
  • [8] An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes
    Wong, Thomas K. F.
    Li, Teng
    Ranjard, Louis
    Wu, Steven H.
    Sukumaran, Jeet
    Rodrigo, Allen G.
    PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (09)
  • [9] Accurate isoform discovery with IsoQuant using long reads
    Andrey D. Prjibelski
    Alla Mikheenko
    Anoushka Joglekar
    Alexander Smetanin
    Julien Jarroux
    Alla L. Lapidus
    Hagen U. Tilgner
    Nature Biotechnology, 2023, 41 : 915 - 918
  • [10] Accurate isoform discovery with IsoQuant using long reads
    Prjibelski, Andrey D.
    Mikheenko, Alla
    Joglekar, Anoushka
    Smetanin, Alexander
    Jarroux, Julien
    Lapidus, Alla L.
    Tilgner, Hagen U.
    NATURE BIOTECHNOLOGY, 2023, 41 (07) : 915 - +