An Efficient Greedy Incremental Sequence Clustering Algorithm

被引:0
|
作者
Ju, Zhen [1 ,2 ]
Zhang, Huiling [1 ,2 ]
Meng, Jingtao [2 ]
Zhang, Jingjing [1 ,2 ]
Li, Xuelei [2 ]
Fan, Jianping [2 ]
Pan, Yi [2 ]
Liu, Weiguo [3 ]
Wei, Yanjie [2 ]
机构
[1] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[2] Shenzhen Inst Adv Technol, Chinese Acad Sci, Shenzhen 518005, Peoples R China
[3] Shandong Univ, Jinan 250100, Peoples R China
基金
美国国家科学基金会;
关键词
Greedy incremental alignment; OneAPI; Gene clustering; Filtering; CD-HIT; PROTEIN;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gene sequence clustering is very basic and important in computational biology and bioinformatics for the study of phylogenetic relationships and gene function prediction, etc. With the rapid growth of the amount of biological data (gene/protein sequences), clustering faces more challenges in low efficiency and precision. For example, there are many redundant sequences in gene databases that do not provide valid information but consume computing resources. Widely used greedy incremental clustering tools improve the efficiency at the cost of precision. To design a balanced gene clustering algorithm, which is both fast and precise, we propose a modified greedy incremental sequence clustering tool, via introducing a pre-filter, a modified short word filter, a new data packing strategy, and GPU accelerates. The experimental evaluations on four independent datasets show that the proposed tool can cluster datasets with precisions of 99.99%. Compared with the results of CD-HIT, Uclust, and Vsearch, the number of redundant sequences by the proposed method is four orders of magnitude less. In addition, on the same hardware platform, our tool is 40% faster than the second-place. The software is available at https://github.com/SIAT-HPCC/gene- sequence-clustering.
引用
收藏
页码:596 / 607
页数:12
相关论文
共 50 条
  • [21] Incremental algorithm of text soft clustering
    Feng, Zhonghui
    Bao, Junpeng
    Shen, Junyi
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2007, 41 (04): : 398 - 401
  • [22] Axioms to Characterize Efficient Incremental Clustering
    Bandyopadhyay, Sambaran
    Murty, M. Narasimha
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 450 - 455
  • [23] Incremental clustering algorithm via crossentropy
    Guan Tao 1
    2.Coll. of Computer and Information Technology
    3.School of Electronics and Information Engineering
    Journal of Systems Engineering and Electronics, 2005, (04) : 781 - 786
  • [24] Incremental support vector clustering algorithm
    Kweon, HyeRyeon
    Ko, ByoungChul
    Lee, Yillbyung
    WMSCI 2005: 9TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 10, 2005, : 242 - 245
  • [25] Incremental clustering algorithm of neural network
    Liu P.
    Tang J.
    Xie S.
    Wang T.
    Guofang Keji Daxue Xuebao, 5 (137-142): : 137 - 142
  • [26] A New Incremental Algorithm for Overlapped Clustering
    Perez Suarez, Airel
    Martinez Trinidad, Jose Fco.
    Carrasco Ochoa, Jesus A.
    Medina Pagola, Jose E.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS, 2009, 5856 : 497 - 504
  • [27] A parallel algorithm for incremental compact clustering
    Gil-García, R
    Badía-Contelles, JM
    Pons-Porrata, A
    EURO-PAR 2003 PARALLEL PROCESSING, PROCEEDINGS, 2003, 2790 : 310 - 317
  • [28] A SOM based Incremental Clustering Algorithm
    Lei Chen
    Zhao, Bao-Jin
    Zhao, Li-Na
    JOURNAL OF COMPUTERS, 2014, 9 (03) : 601 - 607
  • [29] A New Incremental Pairwise Clustering Algorithm
    Seo, Sambu
    Mohr, Johannes
    Obermayer, Klaus
    EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 223 - 228
  • [30] An Incremental Algorithm for Clustering Search Results
    Liu, Yongli
    Ouyang, Yuanxin
    Sheng, Hao
    Xiong, Zhang
    SITIS 2008: 4TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY AND INTERNET BASED SYSTEMS, PROCEEDINGS, 2008, : 112 - 117