A fast clustering algorithm for analyzing highly similar compounds of very large libraries

被引：17

作者：

Li, Weizhong ^{[1
]}

机构：

[1] Burham Inst Med Res, La Jolla, CA 92037 USA

来源：

JOURNAL OF CHEMICAL INFORMATION AND MODELING | 2006年 / 46卷 / 05期

关键词：

D O I：

10.1021/ci0600859

中图分类号：

R914 [药物化学];

学科分类号：

100701 ;

摘要：

As a result of the recent developments of high-throughput screening in drug discovery, the number of available screening compounds has been growing rapidly. Chemical vendors provide millions of compounds; however, these compounds are highly redundant. Clustering analysis, a technique that groups similar compounds into families, can be used to analyze such redundancy. Many available clustering methods focus on accurate classification of compounds; they are slow and are not suitable for very large compound libraries. Here is described a fast clustering method based on an incremental clustering algorithm and the 2D fingerprints of compounds. This method can cluster a very large data set with millions of compounds in hours on a single computer. A program implemented with this method, called cd-hit-fp, is available from http://chemspace. org.

引用

页码：1919 / 1923

页数：5

共 50 条

[21] A Fast Clustering Algorithm for Modularization of Large-Scale Software Systems
Teymourian, Navid
Izadkhah, Habib
Isazadeh, Ayaz
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (04) : 1451 - 1462
[22] A fast and recursive algorithm for clustering large datasets with k-medians
Cardot, Herve
Cenac, Peggy
Monnez, Jean-Marie
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (06) : 1434 - 1449
[23] Scalable grid-based clustering algorithm for very large spatial databases
Sun, Yufen
Lu, Yansheng
2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 763 - 768
[24] A fast Boyer-Moore type pattern matching algorithm for highly similar sequences
Ben Nsira, Nadia
Lecroq, Thierry
Elloumi, Mourad
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 13 (03) : 266 - 288
[25] STRUCTURAL AND DYNAMIC ANALYSIS OF VERY LARGE SYSTEMS USING A FAST PARALLEL ALGORITHM
ELMER, KH
APPLICATIONS OF SUPERCOMPUTERS IN ENGINEERING : FLUID FLOW AND STRESS ANALYSIS APPLICATIONS, 1989, : 229 - 238
[26] DPM: Fast and scalable Clustering Algorithm for Large Scale High Dimensional Datasets
Ghanem, Tamer F.
Elkilani, Wail S.
Ahmed, Hatem S.
Hadhoud, Mohiy M.
2014 10TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2014, : 26 - 35
[27] A Clustering and Routing Algorithm for Fast Changes of Large-Scale WSN in IoT
Fan, Bing
Xin, Yanan
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (03) : 5036 - 5049
[28] DPM: Fast and scalable Clustering Algorithm for Large Scale High Dimensional Datasets
Ghanem, Tamer F.
Elkilani, Wail S.
Ahmed, Hatem S.
Hadhoud, Mohiy M.
2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2014, : 71 - 79
[29] WINP: A window-based incremental and parallel clustering algorithm for very large databases
Qiang, Z
Zheng, Z
Wei, SZ
Daley, E
ICTAI 2005: 17TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 169 - 176
[30] A highly efficient multi-core algorithm for clustering extremely large datasets
Kraus, Johann M.
Kestler, Hans A.
BMC BIOINFORMATICS, 2010, 11

← 1 2 3 4 5 →