Comparative analysis of metagenomic classifiers for long-read sequencing datasets

被引:8
|
作者
Maric, Josip [1 ]
Krizanovic, Kresimir [1 ]
Riondet, Sylvain [2 ,3 ]
Nagarajan, Niranjan [2 ,3 ]
Sikic, Mile [1 ,2 ]
机构
[1] Univ Zagreb, Fac Elect Engn & Comp, Unska 3, Zagreb 10000, Croatia
[2] ASTAR, Genome Inst Singapore GIS, 60 Biopolis St, Singapore 138672, Singapore
[3] Natl Univ Singapore, Yong Loo Lin Sch Med, Singapore 117596, Singapore
关键词
Metagenomics; Long sequenced reads; Classification; Benchmark; Abundance; CLASSIFICATION;
D O I
10.1186/s12859-024-05634-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundLong reads have gained popularity in the analysis of metagenomics data. Therefore, we comprehensively assessed metagenomics classification tools on the species taxonomic level. We analysed kmer-based tools, mapping-based tools and two general-purpose long reads mappers. We evaluated more than 20 pipelines which use either nucleotide or protein databases and selected 13 for an extensive benchmark. We prepared seven synthetic datasets to test various scenarios, including the presence of a host, unknown species and related species. Moreover, we used available sequencing data from three well-defined mock communities, including a dataset with abundance varying from 0.0001 to 20% and six real gut microbiomes.ResultsGeneral-purpose mappers Minimap2 and Ram achieved similar or better accuracy on most testing metrics than best-performing classification tools. They were up to ten times slower than the fastest kmer-based tools requiring up to four times less RAM. All tested tools were prone to report organisms not present in datasets, except CLARK-S, and they underperformed in the case of the high presence of the host's genetic material. Tools which use a protein database performed worse than those based on a nucleotide database. Longer read lengths made classification easier, but due to the difference in read length distributions among species, the usage of only the longest reads reduced the accuracy. The comparison of real gut microbiome datasets shows a similar abundance profiles for the same type of tools but discordance in the number of reported organisms and abundances between types. Most assessments showed the influence of database completeness on the reports.ConclusionThe findings indicate that kmer-based tools are well-suited for rapid analysis of long reads data. However, when heightened accuracy is essential, mappers demonstrate slightly superior performance, albeit at a considerably slower pace. Nevertheless, a combination of diverse categories of tools and databases will likely be necessary to analyse complex samples. Discrepancies observed among tools when applied to real gut datasets, as well as a reduced performance in cases where unknown species or a significant proportion of the host genome is present in the sample, highlight the need for continuous improvement of existing tools. Additionally, regular updates and curation of databases are important to ensure their effectiveness.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Comparative analysis of metagenomic classifiers for long-read sequencing datasets
    Josip Marić
    Krešimir Križanović
    Sylvain Riondet
    Niranjan Nagarajan
    Mile Šikić
    BMC Bioinformatics, 25
  • [2] LONG-READ SEQUENCING FOR THE METAGENOMIC ANALYSIS OF MICROBIOMES
    Free, Tristan
    BIOTECHNIQUES, 2023, 74 (04) : 153 - 155
  • [3] Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets
    Daniel M. Portik
    C. Titus Brown
    N. Tessa Pierce-Ward
    BMC Bioinformatics, 23
  • [4] Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets
    Portik, Daniel M.
    Brown, C. Titus
    Pierce-Ward, N. Tessa
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [5] Improving Bacterial Metagenomic Research through Long-Read Sequencing
    Greenman, Noah
    Hassouneh, Sayf Al-Deen
    Abdelli, Latifa S.
    Johnston, Catherine
    Azarian, Taj
    MICROORGANISMS, 2024, 12 (05)
  • [6] Uncovering microbiomes of the rice phyllosphere using long-read metagenomic sequencing
    Sachiko Masuda
    Pamela Gan
    Yuya Kiguchi
    Mizue Anda
    Kazuhiro Sasaki
    Arisa Shibata
    Wataru Iwasaki
    Wataru Suda
    Ken Shirasu
    Communications Biology, 7
  • [7] Uncovering microbiomes of the rice phyllosphere using long-read metagenomic sequencing
    Masuda, Sachiko
    Gan, Pamela
    Kiguchi, Yuya
    Anda, Mizue
    Sasaki, Kazuhiro
    Shibata, Arisa
    Iwasaki, Wataru
    Suda, Wataru
    Shirasu, Ken
    COMMUNICATIONS BIOLOGY, 2024, 7 (01)
  • [8] Comparison Analysis of Different DNA Extraction Methods on Suitability for Long-Read Metagenomic Nanopore Sequencing
    Zhang, Lei
    Chen, Ting
    Wang, Ye
    Zhang, Shengwei
    Lv, Qingyu
    Kong, Decong
    Jiang, Hua
    Zheng, Yuling
    Ren, Yuhao
    Huang, Wenhua
    Liu, Peng
    Jiang, Yongqiang
    FRONTIERS IN CELLULAR AND INFECTION MICROBIOLOGY, 2022, 12
  • [9] Long-read sequencing data analysis for yeasts
    Yue, Jia-Xing
    Liti, Gianni
    NATURE PROTOCOLS, 2018, 13 (06) : 1213 - 1231
  • [10] Long-read sequencing data analysis for yeasts
    Jia-Xing Yue
    Gianni Liti
    Nature Protocols, 2018, 13 : 1213 - 1231