A weighted common structure based clustering technique for XML documents

被引:11
|
作者
Hwang, Jeong Hee [2 ]
Ryu, Keun Ho [1 ]
机构
[1] Chungbuk Natl Univ, Sch Elect & Comp Engn, Database Bioinformat Lab, Cheongju 361763, Chungbuk, South Korea
[2] Namseoul Univ, Dept Comp Sci, Cheonan 331707, Chungnam, South Korea
关键词
Data mining; XML mining; Document clustering; XML clustering;
D O I
10.1016/j.jss.2010.02.004
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
XML has recently become very popular as a means of representing semistructured data and as a standard for data exchange over the Web, because of its varied applicability in numerous applications. Therefore, XML documents constitute an important data mining domain. In this paper, we propose a new method of XML document clustering by a global criterion function, considering the weight of common structures. Our approach initially extracts representative structures of frequent patterns from schemaless XML documents using a sequential pattern mining algorithm. Then, we perform clustering of an XML document by the weight of common structures, without a measure of pairwise similarity, assuming that an XML document is a transaction and frequent structures extracted from documents are items of the transaction. We conducted experiments to compare our method with previous methods. The experimental results show the effectiveness of our approach. Crown Copyright (C) 2010 Published by Elsevier Inc. All rights reserved.
引用
收藏
页码:1267 / 1274
页数:8
相关论文
共 50 条
  • [41] Similarity Evaluation of XML Documents Based on Weighted Element Tree Model
    Wang, Chenying
    Yuan, Xiaojie
    Ning, Hua
    Lian, Xin
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 680 - 687
  • [42] Structural query expansion based on weighted query term for XML documents
    School of Information and Technology, Jiangxi University of Finance and Economics, Nanchang 330013, China
    不详
    Ruan Jian Xue Bao/Journal of Software, 2008, 19 (10): : 2611 - 2619
  • [43] Annotation of Cultural Heritage Documents Based on XML Dictionaries and Data Clustering
    Theodosiou, Zenonas
    Georgiou, Olga
    Tsapatsoulis, Nicolas
    Kounoudes, Anastasis
    Milis, Marios
    DIGITAL HERITAGE, 2010, 6436 : 306 - +
  • [44] Overview of the INEX 2008 XML Mining Track Categorization and Clustering of XML Documents in a Graph of Documents
    Denoyer, Ludovic
    Gallinari, Patrick
    ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 : 401 - 411
  • [45] Classifying XML documents based on Structure/Content similarity
    Xing, Guangming
    Guo, Jinhua
    Xia, Zhonghang
    COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 444 - 457
  • [46] Using structural similarity for clustering XML documents
    Aitelhadj, Ali
    Boughanem, Mohand
    Mezghiche, Mohamed
    Souam, Fatiha
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 109 - 139
  • [47] Clustering XML Documents Using Frequent Subtrees
    Kutty, Sangeetha
    Tran, Tien
    Nayak, Richi
    Li, Yuefeng
    ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 : 436 - 445
  • [48] Clustering XML documents using structural summaries
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    CURRENT TRENDS IN DATABASE TECHNOLOGY - EDBT 2004 WORKSHOPS, PROCEEDINGS, 2004, 3268 : 547 - 556
  • [49] Novel mixed clustering method for XML documents
    College of Information and Communications Engineering, Harbin Engineering University, Harbin 150001, China
    不详
    Harbin Gongcheng Daxue Xuebao, 2007, 6 (697-701):
  • [50] Similarity measurement of XML documents based on structure and contents
    Kim, Tae-Soon
    Lee, Ju-Hong
    Song, Jae-Won
    Kim, Deok-Hwan
    COMPUTATIONAL SCIENCE - ICCS 2007, PT 3, PROCEEDINGS, 2007, 4489 : 902 - +