A fast clustering algorithm for analyzing highly similar compounds of very large libraries

被引:17
|
作者
Li, Weizhong [1 ]
机构
[1] Burham Inst Med Res, La Jolla, CA 92037 USA
关键词
D O I
10.1021/ci0600859
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
As a result of the recent developments of high-throughput screening in drug discovery, the number of available screening compounds has been growing rapidly. Chemical vendors provide millions of compounds; however, these compounds are highly redundant. Clustering analysis, a technique that groups similar compounds into families, can be used to analyze such redundancy. Many available clustering methods focus on accurate classification of compounds; they are slow and are not suitable for very large compound libraries. Here is described a fast clustering method based on an incremental clustering algorithm and the 2D fingerprints of compounds. This method can cluster a very large data set with millions of compounds in hours on a single computer. A program implemented with this method, called cd-hit-fp, is available from http://chemspace. org.
引用
收藏
页码:1919 / 1923
页数:5
相关论文
共 50 条
  • [21] A Fast Clustering Algorithm for Modularization of Large-Scale Software Systems
    Teymourian, Navid
    Izadkhah, Habib
    Isazadeh, Ayaz
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (04) : 1451 - 1462
  • [22] A fast and recursive algorithm for clustering large datasets with k-medians
    Cardot, Herve
    Cenac, Peggy
    Monnez, Jean-Marie
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (06) : 1434 - 1449
  • [23] Scalable grid-based clustering algorithm for very large spatial databases
    Sun, Yufen
    Lu, Yansheng
    2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 763 - 768
  • [24] A fast Boyer-Moore type pattern matching algorithm for highly similar sequences
    Ben Nsira, Nadia
    Lecroq, Thierry
    Elloumi, Mourad
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 13 (03) : 266 - 288
  • [25] STRUCTURAL AND DYNAMIC ANALYSIS OF VERY LARGE SYSTEMS USING A FAST PARALLEL ALGORITHM
    ELMER, KH
    APPLICATIONS OF SUPERCOMPUTERS IN ENGINEERING : FLUID FLOW AND STRESS ANALYSIS APPLICATIONS, 1989, : 229 - 238
  • [26] DPM: Fast and scalable Clustering Algorithm for Large Scale High Dimensional Datasets
    Ghanem, Tamer F.
    Elkilani, Wail S.
    Ahmed, Hatem S.
    Hadhoud, Mohiy M.
    2014 10TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2014, : 26 - 35
  • [27] A Clustering and Routing Algorithm for Fast Changes of Large-Scale WSN in IoT
    Fan, Bing
    Xin, Yanan
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (03) : 5036 - 5049
  • [28] DPM: Fast and scalable Clustering Algorithm for Large Scale High Dimensional Datasets
    Ghanem, Tamer F.
    Elkilani, Wail S.
    Ahmed, Hatem S.
    Hadhoud, Mohiy M.
    2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2014, : 71 - 79
  • [29] WINP: A window-based incremental and parallel clustering algorithm for very large databases
    Qiang, Z
    Zheng, Z
    Wei, SZ
    Daley, E
    ICTAI 2005: 17TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 169 - 176
  • [30] A highly efficient multi-core algorithm for clustering extremely large datasets
    Kraus, Johann M.
    Kestler, Hans A.
    BMC BIOINFORMATICS, 2010, 11