INCREMENTAL CLUSTERING FOR VERY LARGE DOCUMENT DATABASES - INITIAL MARIAN EXPERIENCE

被引:16
|
作者
CAN, F
FOX, EA
SNAVELY, CD
FRANCE, RK
机构
[1] VIRGINIA POLYTECH INST & STATE UNIV,DEPT COMP SCI,BLACKSBURG,VA 24061
[2] VIRGINIA POLYTECH INST & STATE UNIV,CTR COMP,BLACKSBURG,VA 24061
关键词
D O I
10.1016/0020-0255(94)00111-N
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering of document databases is useful for both browsing and searching purposes; however, this can be a prohibitively expensive computational process for lai ge collections. This problem is compounded when the clustering structure must reflect a constantly changing database. Therefore, efficient algorithms which maintain an existing clustering structure are desirable, This study provides the details of a large-scale implementation of the Cover-Coefficient-based Incremental Clustering Methodology (C(2)ICM). The experiments performed on a sample of the MARIAN database show that its resource requirements are within practical bounds for most platforms. Furthermore, C(2)ICM offers considerable savings over reclustering. The results of this study will lead to an additional type of browsing and/or searching facility on the Virginia Tech-based MARIAN large online public access library catalog (OPAC) project.
引用
收藏
页码:101 / 114
页数:14
相关论文
共 50 条
  • [1] An incremental document clustering for the large document database
    Joo, KH
    Lee, WS
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 374 - 387
  • [2] WINP: A window-based incremental and parallel clustering algorithm for very large databases
    Qiang, Z
    Zheng, Z
    Wei, SZ
    Daley, E
    ICTAI 2005: 17TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 169 - 176
  • [3] WIDE: Clustering algorithm for very large databases
    School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China
    Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban), 2006, 7 (826-831):
  • [4] Clustering and validation for very large databases (VLDB)
    Momin, Bashirahamad Fardin
    2006 INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2007, : 258 - 263
  • [5] An incremental clustering scheme for duplicate detection in large databases
    Cesario, E
    Folino, F
    Manco, G
    Pontieri, L
    9TH INTERNATIONAL DATABASE ENGINEERING & APPLICATION SYMPOSIUM, PROCEEDINGS, 2005, : 89 - 95
  • [6] Effective incremental clustering for duplicate detection in large databases
    Folino, Francesco
    Manco, Giuseppe
    Pontieri, Luigi
    10TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2006, : 45 - 52
  • [7] Efficient clustering of very large document collections
    Dhillon, IS
    Fan, J
    Guan, YQ
    DATA MINING FOR SCIENTIFIC AND ENGINEERING APPLICATIONS, 2001, 2 : 357 - 381
  • [8] Short documents clustering in very large text databases
    Wang, Yongheng
    Jia, Yan
    Yang, Shuqiang
    WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 83 - 93
  • [9] Clustering in very large databases based on distance and density
    Qian, WN
    Gong, XQ
    Zhou, AY
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2003, 18 (01) : 67 - 76
  • [10] Clustering in very large databases based on distance and density
    Weining Qian
    XueQing Gong
    AoYing Zhou
    Journal of Computer Science and Technology, 2003, 18 : 67 - 76