INCREMENTAL CLUSTERING FOR VERY LARGE DOCUMENT DATABASES - INITIAL MARIAN EXPERIENCE

被引:16
|
作者
CAN, F
FOX, EA
SNAVELY, CD
FRANCE, RK
机构
[1] VIRGINIA POLYTECH INST & STATE UNIV,DEPT COMP SCI,BLACKSBURG,VA 24061
[2] VIRGINIA POLYTECH INST & STATE UNIV,CTR COMP,BLACKSBURG,VA 24061
关键词
D O I
10.1016/0020-0255(94)00111-N
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering of document databases is useful for both browsing and searching purposes; however, this can be a prohibitively expensive computational process for lai ge collections. This problem is compounded when the clustering structure must reflect a constantly changing database. Therefore, efficient algorithms which maintain an existing clustering structure are desirable, This study provides the details of a large-scale implementation of the Cover-Coefficient-based Incremental Clustering Methodology (C(2)ICM). The experiments performed on a sample of the MARIAN database show that its resource requirements are within practical bounds for most platforms. Furthermore, C(2)ICM offers considerable savings over reclustering. The results of this study will lead to an additional type of browsing and/or searching facility on the Virginia Tech-based MARIAN large online public access library catalog (OPAC) project.
引用
收藏
页码:101 / 114
页数:14
相关论文
共 50 条
  • [41] A scaleable document clustering approach for large document corpora
    Rooney, Niall
    Patterson, David
    Galushka, Mykola
    Dobrynin, Vladimir
    INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (05) : 1163 - 1175
  • [42] Clustering large unstructured document sets
    Kogan, J
    COMPUTATIONAL INFORMATION RETRIEVAL, 2001, : 107 - 117
  • [43] An efficient mining method for incremental updation in large databases
    Lee, WJ
    Lee, SJ
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 630 - 637
  • [44] An Incremental Technique for Mining Coverage Patterns in Large Databases
    Ralla, Akhil
    Reddy, P. Krishna
    Mondal, Anirban
    2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 211 - 220
  • [45] Incremental mining large itemsets with constraints in dynamic databases
    Li, Naiqian
    Shen, Junyi
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2003, 37 (04): : 359 - 363
  • [46] A general mining method for incremental updation in large databases
    Lee, WJ
    Lee, SJ
    2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 1423 - 1428
  • [47] An incremental document clustering algorithm based on a hierarchical agglomerative approach
    Joo, KH
    Lee, SJ
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2005, 3816 : 321 - 332
  • [48] Very large databases in a commercial application environment
    Hess, KH
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1996, : 2 - 2
  • [49] A DATABASE MACHINE FOR VERY LARGE RELATIONAL DATABASES
    QADAH, GZ
    IRANI, KB
    IEEE TRANSACTIONS ON COMPUTERS, 1985, 34 (11) : 1015 - 1025
  • [50] DBTree: Very large phylogenies in portable databases
    Vos, Rutger A.
    METHODS IN ECOLOGY AND EVOLUTION, 2020, 11 (03): : 457 - 463