ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference

被引:262
|
作者
Steenwyk, Jacob L. [1 ]
Buida, Thomas J., III
Li, Yuanning [1 ]
Shen, Xing-Xing [2 ]
Rokas, Antonis [1 ]
机构
[1] Vanderbilt Univ, Dept Biol Sci, 221 Kirkland Hall, Nashville, TN 37235 USA
[2] Zhejiang Univ, Key Lab Mol Biol Crop Pathogens & Insects, Minist Agr, Inst Insect Sci, Hangzhou, Peoples R China
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
TREE; COALESCENT; PLACEMENT; SISTER; SITES; RATES; TOOL;
D O I
10.1371/journal.pbio.3001007
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing these sites prior to phylogenetic inference. However, a recent study reported that doing so can worsen inference, underscoring the need for alternative alignment trimming strategies. Here, we introduce ClipKIT, an alignment trimming software that, rather than identifying and removing putatively phylogenetically uninformative sites, instead aims to identify and retain parsimony-informative sites, which are known to be phylogenetically informative. To test the efficacy of ClipKIT, we examined the accuracy and support of phylogenies inferred from 14 different alignment trimming strategies, including those implemented in ClipKIT, across nearly 140,000 alignments from a broad sampling of evolutionary histories. Phylogenies inferred from ClipKIT-trimmed alignments are accurate, robust, and time saving. Furthermore, ClipKIT consistently outperformed other trimming methods across diverse datasets, suggesting that strategies based on identifying and retaining parsimony-informative sites provide a robust framework for alignment trimming.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee
    Jia-Ming Chang
    Paolo Di Tommaso
    Jean-François Taly
    Cedric Notredame
    BMC Bioinformatics, 13
  • [22] A new protein linear motif benchmark for multiple sequence alignment software
    Emmanuel Perrodou
    Claudia Chica
    Olivier Poch
    Toby J Gibson
    Julie D Thompson
    BMC Bioinformatics, 9
  • [23] ABC: software for interactive browsing of genomic multiple sequence alignment data
    Gregory M Cooper
    Senthil AG Singaravelu
    Arend Sidow
    BMC Bioinformatics, 5
  • [24] Malicioius Software Detection Using Multiple Sequence Alignment and Data Mining
    Chen, Yi
    Narayanan, Ajit
    Pang, Shaoning
    Tao, Ban
    2012 IEEE 26TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2012, : 8 - 14
  • [25] A new protein linear motif benchmark for multiple sequence alignment software
    Perrodou, Emmanuel
    Chica, Claudia
    Poch, Olivier
    Gibson, Toby J.
    Thompson, Julie D.
    BMC BIOINFORMATICS, 2008, 9 (1)
  • [26] ABC: software for interactive browsing of genomic multiple sequence alignment data
    Cooper, GM
    Singaravelu, SAG
    Sidow, A
    BMC BIOINFORMATICS, 2004, 5 (1)
  • [27] Activity Inference through Sequence Alignment
    Choujaa, Driss
    Dulay, Naranker
    LOCATION AND CONTEXT AWARENESS: 4TH INTERNATIONAL SYMPOSIUM, LOCA 2009, 2009, 5561 : 19 - 36
  • [28] VCSRA: A fast and accurate multiple sequence alignment algorithm with a high degree of parallelism
    Dong, Dong
    Su, Wenhe
    Shi, Wenqiang
    Zou, Quan
    Peng, Shaoliang
    JOURNAL OF GENETICS AND GENOMICS, 2018, 45 (07) : 407 - 410
  • [29] T-Coffee: A novel method for fast and accurate multiple sequence alignment
    Notredame, C
    Higgins, DG
    Heringa, J
    JOURNAL OF MOLECULAR BIOLOGY, 2000, 302 (01) : 205 - 217
  • [30] VCSRA: A fast and accurate multiple sequence alignment algorithm with a high degree of parallelism
    Dong Dong
    Wenhe Su
    Wenqiang Shi
    Quan Zou
    Shaoliang Peng
    JournalofGeneticsandGenomics, 2018, 45 (07) : 407 - 410