INCREMENTAL CLUSTERING FOR VERY LARGE DOCUMENT DATABASES - INITIAL MARIAN EXPERIENCE

被引:16
|
作者
CAN, F
FOX, EA
SNAVELY, CD
FRANCE, RK
机构
[1] VIRGINIA POLYTECH INST & STATE UNIV,DEPT COMP SCI,BLACKSBURG,VA 24061
[2] VIRGINIA POLYTECH INST & STATE UNIV,CTR COMP,BLACKSBURG,VA 24061
关键词
D O I
10.1016/0020-0255(94)00111-N
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering of document databases is useful for both browsing and searching purposes; however, this can be a prohibitively expensive computational process for lai ge collections. This problem is compounded when the clustering structure must reflect a constantly changing database. Therefore, efficient algorithms which maintain an existing clustering structure are desirable, This study provides the details of a large-scale implementation of the Cover-Coefficient-based Incremental Clustering Methodology (C(2)ICM). The experiments performed on a sample of the MARIAN database show that its resource requirements are within practical bounds for most platforms. Furthermore, C(2)ICM offers considerable savings over reclustering. The results of this study will lead to an additional type of browsing and/or searching facility on the Virginia Tech-based MARIAN large online public access library catalog (OPAC) project.
引用
收藏
页码:101 / 114
页数:14
相关论文
共 50 条
  • [21] Reporting on Very Large Databases
    Data Based Advis, 3 (44):
  • [22] Mining very large databases
    Ganti, V
    Gehrke, J
    Ramakrishnan, R
    COMPUTER, 1999, 32 (08) : 38 - +
  • [23] Document Clustering Using Incremental and Pairwise Approaches
    Tran, Tien
    Nayak, Richi
    Bruza, Peter
    FOCUSED ACCESS TO XML DOCUMENTS, 2008, 4862 : 222 - 233
  • [24] Multilevel clustering for large databases
    Lechevallier, Yves
    Ciampi, Antonio
    ADVANCES IN STATISTICAL METHODS FOR THE HEALTH SCIENCES: APPLICATIONS TO CANCER AND AIDS STUDIES, GENOME SEQUENCE ANALYSIS, AND SURVIVAL ANALYSIS, 2007, : 263 - +
  • [25] Incremental Document Clustering Based on Graph Model
    Nguyen-Hoang, Tu-Anh
    Hoang, Kiem
    Bui-Thi, Danh
    Nguyen, Anh-Thy
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 569 - +
  • [26] Incremental document clustering for web page classification
    Wong, WC
    Fu, AWC
    ENABLING SOCIETY WITH INFORMATION TECHNOLOGY, 2002, : 101 - 110
  • [27] WaveCluster:: a wavelet-based clustering approach for spatial data in very large databases
    Sheikholeslami, G
    Chatterjee, S
    Zhang, AD
    VLDB JOURNAL, 2000, 8 (3-4): : 289 - 304
  • [28] WaveCluster: a wavelet-based clustering approach for spatial data in very large databases
    Gholamhosein Sheikholeslami
    Surojit Chatterjee
    Aidong Zhang
    The VLDB Journal, 2000, 8 : 289 - 304
  • [29] Incremental update on sequential patterns in large databases
    Lin, MY
    Lee, SY
    TENTH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1998, : 24 - 31
  • [30] Incremental mining of sequential patterns in large databases
    Masseglia, F
    Poncelet, P
    Teisseire, M
    DATA & KNOWLEDGE ENGINEERING, 2003, 46 (01) : 97 - 121