A fast clustering algorithm for analyzing highly similar compounds of very large libraries

被引:17
|
作者
Li, Weizhong [1 ]
机构
[1] Burham Inst Med Res, La Jolla, CA 92037 USA
关键词
D O I
10.1021/ci0600859
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
As a result of the recent developments of high-throughput screening in drug discovery, the number of available screening compounds has been growing rapidly. Chemical vendors provide millions of compounds; however, these compounds are highly redundant. Clustering analysis, a technique that groups similar compounds into families, can be used to analyze such redundancy. Many available clustering methods focus on accurate classification of compounds; they are slow and are not suitable for very large compound libraries. Here is described a fast clustering method based on an incremental clustering algorithm and the 2D fingerprints of compounds. This method can cluster a very large data set with millions of compounds in hours on a single computer. A program implemented with this method, called cd-hit-fp, is available from http://chemspace. org.
引用
收藏
页码:1919 / 1923
页数:5
相关论文
共 50 条
  • [1] A fast algorithm for searching for molecules containing a pharmacophore in very large virtual combinatorial libraries
    Olender, R
    Rosenfeld, R
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (03): : 731 - 738
  • [2] WIDE: Clustering algorithm for very large databases
    School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China
    Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban), 2006, 7 (826-831):
  • [3] A fast pattern matching algorithm for highly similar sequences
    Ben Nsira, Nadia
    Lecroq, Thierry
    Elloumi, Mourad
    2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
  • [4] A genetic algorithm for clustering on very large data sets
    Gasvoda, J
    Ding, Q
    COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2003, : 163 - 167
  • [5] Fast conversion algorithm for very large Boolean functions
    Wang, L
    Almani, AEA
    ELECTRONICS LETTERS, 2000, 36 (16) : 1370 - 1371
  • [6] SPICi: a fast clustering algorithm for large biological networks
    Jiang, Peng
    Singh, Mona
    BIOINFORMATICS, 2010, 26 (08) : 1105 - 1111
  • [7] A Fast Parallel Clustering Algorithm for Large Spatial Databases
    Xiaowei Xu
    Jochen Jäger
    Hans-Peter Kriegel
    Data Mining and Knowledge Discovery, 1999, 3 : 263 - 290
  • [8] A fast parallel clustering algorithm for large spatial databases
    Xu, XW
    Jäger, J
    Kriegel, HP
    DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 3 (03) : 263 - 290
  • [9] Analysis and comparison of very large metagenomes with fast clustering and functional annotation
    Weizhong Li
    BMC Bioinformatics, 10
  • [10] Analysis and comparison of very large metagenomes with fast clustering and functional annotation
    Li, Weizhong
    BMC BIOINFORMATICS, 2009, 10