Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences

被引:1
|
作者
Wang, Wei [1 ]
Smith, Jack [1 ]
Hejase, Hussein A. [2 ]
Liu, Kevin J. [1 ]
机构
[1] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
[2] Cold Spring Harbor Lab, Simons Ctr Quantitat Biol, POB 100, Cold Spring Harbor, NY 11724 USA
基金
美国国家科学基金会;
关键词
Statistical support; Non-parametric; Semi-parametric; Resampling; Bootstrap; Multiple sequence alignment; Random walk; MULTIPLE; RELIABILITY; ALIGNMENTS;
D O I
10.1186/s13015-020-00167-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related techniques assume that sites are independent and identically distributed (i.i.d.). The i.i.d. assumption can be an over-simplification for many problems in computational biology and bioinformatics. In particular, sequential dependence within biomolecular sequences is often an essential biological feature due to biochemical function, evolutionary processes such as recombination, and other factors. To relax the simplifying i.i.d. assumption, we propose a new non-parametric/semi-parametric sequential resampling technique that generalizes "Heads-or-Tails" mirrored inputs, a simple but clever technique due to Landan and Graur. The generalized procedure takes the form of random walks along either aligned or unaligned biomolecular sequences. We refer to our new method as the SERES (or "SEquential RESampling") method. To demonstrate the performance of the new technique, we apply SERES to estimate support for the multiple sequence alignment problem. Using simulated and empirical data, we show that SERES-based support estimation yields comparable or typically better performance compared to state-of-the-art methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences
    Wei Wang
    Jack Smith
    Hussein A. Hejase
    Kevin J. Liu
    Algorithms for Molecular Biology, 15
  • [2] Non-parametric and Semi-parametric Support Estimation Using SEquential RESampling Random Walks on Biomolecular Sequences
    Wang, Wei
    Smith, Jack
    Hejase, Hussein A.
    Liu, Kevin J.
    COMPARATIVE GENOMICS (RECOMB-CG 2018), 2018, 11183 : 294 - 308
  • [3] Density estimation using non-parametric and semi-parametric mixtures
    Wang, Yong
    Chee, Chew-Seng
    STATISTICAL MODELLING, 2012, 12 (01) : 67 - 92
  • [4] Non-parametric and semi-parametric asset pricing
    Erdos, Peter
    Ormos, Mihaly
    Zibriczky, David
    ECONOMIC MODELLING, 2011, 28 (03) : 1150 - 1162
  • [5] Generalized EM estimation for semi-parametric mixture distributions with discretized non-parametric component
    Ma, Jun
    Gudlaugsdottir, Sigurbjorg
    Wood, Graham
    STATISTICS AND COMPUTING, 2011, 21 (04) : 601 - 612
  • [6] Generalized EM estimation for semi-parametric mixture distributions with discretized non-parametric component
    Jun Ma
    Sigurbjorg Gudlaugsdottir
    Graham Wood
    Statistics and Computing, 2011, 21 : 601 - 612
  • [7] ESTIMATED NON-PARAMETRIC AND SEMI-PARAMETRIC MODEL FOR LONGITUDINAL DATA
    AL-Adilee, Reem Tallal Kamil
    Aboudi, Emad Hazim
    INTERNATIONAL JOURNAL OF AGRICULTURAL AND STATISTICAL SCIENCES, 2021, 17 : 1963 - 1972
  • [8] Semi-parametric and Non-parametric Term Weighting for Information Retrieval
    Metzler, Donald
    Zaragoza, Hugo
    ADVANCES IN INFORMATION RETRIEVAL THEORY, 2009, 5766 : 42 - 53
  • [9] Cumulative estimation in semi-parametric models - (Non-parametric estimator base for a general weight function)
    Hu, HC
    Sun, HY
    SURVEY REVIEW, 2005, 38 (296) : 158 - 164
  • [10] Comparison of non-parametric and semi-parametric tests in detecting long memory
    Boutahar, Mohamed
    JOURNAL OF APPLIED STATISTICS, 2009, 36 (09) : 945 - 972