Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression

被引:14
|
作者
Tang, Kujin [1 ]
Ren, Jie [1 ]
Sun, Fengzhu [1 ]
机构
[1] Univ Southern Calif, Quantitat & Computat Biol Program, Dept Biol Sci, Los Angeles, CA 90007 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Alignment-free; Neural network regression; kmer; d(2)*; d(2)(s); NGS; Bias adjustment; DISSIMILARITY MEASURES;
D O I
10.1186/s13059-019-1872-3
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Alignment-free methods, more time and memory efficient than alignment-based methods, have been widely used for comparing genome sequences or raw sequencing samples without assembly. However, in this study, we show that alignment-free dissimilarity calculated based on sequencing samples can be overestimated compared with the dissimilarity calculated based on their genomes, and this bias can significantly decrease the performance of the alignment-free analysis. Here, we introduce a new alignment-free tool, Alignment-Free methods Adjusted by Neural Network (Afann) that successfully adjusts this bias and achieves excellent performance on various independent datasets. Afann is freely available at https://github.com/GeniusTang/Afann.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Identifying SNARE Proteins Using an Alignment-Free Method Based on Multiscan Convolutional Neural Network and PSSM Profiles
    Le, Nguyen Quoc Khanh
    Kha, Quang-Hien
    Ho, Quang-Thai
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (19) : 4820 - 4826
  • [22] Alignment-free sequence comparison using joint frequency and position information of k-words
    Han, Gyu-Bum
    Chung, Byung Chang
    Cho, Dong-Ho
    2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 3880 - 3883
  • [23] Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency
    Soares, Ines
    Goios, Ana
    Amorim, Antonio
    SCIENTIFIC WORLD JOURNAL, 2012,
  • [24] Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
    Horwege, Sebastian
    Lindner, Sebastian
    Boden, Marcus
    Hatje, Klas
    Kollmar, Martin
    Leimeister, Chris-Andre
    Morgenstern, Burkhard
    NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) : W7 - W11
  • [25] Alignment-free sequence comparison method based on whole genomes and its application to virus phylogeny
    College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
    不详
    Tien Tzu Hsueh Pao, 2006, 2 (277-281):
  • [26] Protein map: An alignment-free sequence comparison method based on various properties of amino acids
    Yu, Chenglong
    Cheng, Shiu-Yuen
    He, Rong L.
    Yau, Stephen S. -T.
    GENE, 2011, 486 (1-2) : 110 - 118
  • [27] Alignment-free genome sequence comparison method based on pair transition difference of k-words
    Han, Gyu-Bum
    Cho, Dong-Ho
    2017 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL & HEALTH INFORMATICS (BHI), 2017, : 45 - 48
  • [28] GRAFENE: Graphlet-based alignment-free network approach integrates 3D structural and sequence (residue order) data to improve protein structural comparison
    Faisal, Fazle E.
    Newaz, Khalique
    Chaney, Julie L.
    Li, Jun
    Emrich, Scott J.
    Clark, Patricia L.
    Milenkovic, Tijana
    SCIENTIFIC REPORTS, 2017, 7
  • [29] GRAFENE: Graphlet-based alignment-free network approach integrates 3D structural and sequence (residue order) data to improve protein structural comparison
    Fazle E. Faisal
    Khalique Newaz
    Julie L. Chaney
    Jun Li
    Scott J. Emrich
    Patricia L. Clark
    Tijana Milenković
    Scientific Reports, 7
  • [30] An Alignment-Free Regression Approach for Estimating Allele-Specific Expression Using RNA-Seq Data
    Fu, Chen-Ping
    Jojic, Vladimir
    McMillan, Leonard
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB2014, 2014, 8394 : 69 - 84