A fast clustering algorithm for analyzing highly similar compounds of very large libraries

被引：17

作者：

Li, Weizhong ^{[1
]}

机构：

[1] Burham Inst Med Res, La Jolla, CA 92037 USA

来源：

JOURNAL OF CHEMICAL INFORMATION AND MODELING | 2006年 / 46卷 / 05期

关键词：

D O I：

10.1021/ci0600859

中图分类号：

R914 [药物化学];

学科分类号：

100701 ;

摘要：

As a result of the recent developments of high-throughput screening in drug discovery, the number of available screening compounds has been growing rapidly. Chemical vendors provide millions of compounds; however, these compounds are highly redundant. Clustering analysis, a technique that groups similar compounds into families, can be used to analyze such redundancy. Many available clustering methods focus on accurate classification of compounds; they are slow and are not suitable for very large compound libraries. Here is described a fast clustering method based on an incremental clustering algorithm and the 2D fingerprints of compounds. This method can cluster a very large data set with millions of compounds in hours on a single computer. A program implemented with this method, called cd-hit-fp, is available from http://chemspace. org.

引用

页码：1919 / 1923

页数：5

共 50 条

[1] A fast algorithm for searching for molecules containing a pharmacophore in very large virtual combinatorial libraries
Olender, R
Rosenfeld, R
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (03): : 731 - 738
[2] WIDE: Clustering algorithm for very large databases
School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China
Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban), 2006, 7 (826-831):
[3] A fast pattern matching algorithm for highly similar sequences
Ben Nsira, Nadia
Lecroq, Thierry
Elloumi, Mourad
2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
[4] A genetic algorithm for clustering on very large data sets
Gasvoda, J
Ding, Q
COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2003, : 163 - 167
[5] Fast conversion algorithm for very large Boolean functions
Wang, L
Almani, AEA
ELECTRONICS LETTERS, 2000, 36 (16) : 1370 - 1371
[6] SPICi: a fast clustering algorithm for large biological networks
Jiang, Peng
Singh, Mona
BIOINFORMATICS, 2010, 26 (08) : 1105 - 1111
[7] A Fast Parallel Clustering Algorithm for Large Spatial Databases
Xiaowei Xu
Jochen Jäger
Hans-Peter Kriegel
Data Mining and Knowledge Discovery, 1999, 3 : 263 - 290
[8] A fast parallel clustering algorithm for large spatial databases
Xu, XW
Jäger, J
Kriegel, HP
DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 3 (03) : 263 - 290
[9] Analysis and comparison of very large metagenomes with fast clustering and functional annotation
Weizhong Li
BMC Bioinformatics, 10
[10] Analysis and comparison of very large metagenomes with fast clustering and functional annotation
Li, Weizhong
BMC BIOINFORMATICS, 2009, 10

← 1 2 3 4 5 →