INCREMENTAL CLUSTERING FOR VERY LARGE DOCUMENT DATABASES - INITIAL MARIAN EXPERIENCE

被引:16
|
作者
CAN, F
FOX, EA
SNAVELY, CD
FRANCE, RK
机构
[1] VIRGINIA POLYTECH INST & STATE UNIV,DEPT COMP SCI,BLACKSBURG,VA 24061
[2] VIRGINIA POLYTECH INST & STATE UNIV,CTR COMP,BLACKSBURG,VA 24061
关键词
D O I
10.1016/0020-0255(94)00111-N
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering of document databases is useful for both browsing and searching purposes; however, this can be a prohibitively expensive computational process for lai ge collections. This problem is compounded when the clustering structure must reflect a constantly changing database. Therefore, efficient algorithms which maintain an existing clustering structure are desirable, This study provides the details of a large-scale implementation of the Cover-Coefficient-based Incremental Clustering Methodology (C(2)ICM). The experiments performed on a sample of the MARIAN database show that its resource requirements are within practical bounds for most platforms. Furthermore, C(2)ICM offers considerable savings over reclustering. The results of this study will lead to an additional type of browsing and/or searching facility on the Virginia Tech-based MARIAN large online public access library catalog (OPAC) project.
引用
收藏
页码:101 / 114
页数:14
相关论文
共 50 条
  • [31] Incremental document clustering using cluster similarity histograms
    Hammouda, KM
    Kamel, MS
    IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 597 - 601
  • [32] Active learning in very large databases
    Panda, Navneet
    Goh, King-Shy
    Chang, Edward Y.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2006, 31 (03) : 249 - 267
  • [33] Efficient Incremental Phrase-Based Document Clustering
    Bakr, Ahmad M.
    Yousri, Noha A.
    Ismail, Mohamed A.
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 517 - 520
  • [34] Active learning in very large databases
    Navneet Panda
    King-Shy Goh
    Edward Y. Chang
    Multimedia Tools and Applications, 2006, 31 : 249 - 267
  • [35] Scalable Blocking for Very Large Databases
    Borthwick, Andrew
    Ash, Stephen
    Pang, Bin
    Qureshi, Shehzad
    Jones, Timothy
    ECML PKDD 2020 WORKSHOPS, 2020, 1323 : 303 - 319
  • [36] Association rules in very large databases
    不详
    ASSOCIATION RULE MINING: MODELS AND ALGORITHMS, 2002, 2307 : 161 - 198
  • [37] Clustering Large Databases in Distributed Environment
    Pakhira, Malay K.
    2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 351 - 358
  • [38] Clustering of Short Strings in Large Databases
    Kazimianec, Michail
    Mazeika, Arturas
    PROCEEDINGS OF THE 20TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATION, 2009, : 368 - +
  • [39] A clustering method for large spatial databases
    Schoier, G
    Borruso, G
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2004, PT 2, 2004, 3044 : 1089 - 1095
  • [40] Incremental CFS Clustering on Large Data
    Zhao, Liang
    Chen, Zhikui
    Yang, Yi
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 687 - 690