A weighted common structure based clustering technique for XML documents

被引:11
|
作者
Hwang, Jeong Hee [2 ]
Ryu, Keun Ho [1 ]
机构
[1] Chungbuk Natl Univ, Sch Elect & Comp Engn, Database Bioinformat Lab, Cheongju 361763, Chungbuk, South Korea
[2] Namseoul Univ, Dept Comp Sci, Cheonan 331707, Chungnam, South Korea
关键词
Data mining; XML mining; Document clustering; XML clustering;
D O I
10.1016/j.jss.2010.02.004
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
XML has recently become very popular as a means of representing semistructured data and as a standard for data exchange over the Web, because of its varied applicability in numerous applications. Therefore, XML documents constitute an important data mining domain. In this paper, we propose a new method of XML document clustering by a global criterion function, considering the weight of common structures. Our approach initially extracts representative structures of frequent patterns from schemaless XML documents using a sequential pattern mining algorithm. Then, we perform clustering of an XML document by the weight of common structures, without a measure of pairwise similarity, assuming that an XML document is a transaction and frequent structures extracted from documents are items of the transaction. We conducted experiments to compare our method with previous methods. The experimental results show the effectiveness of our approach. Crown Copyright (C) 2010 Published by Elsevier Inc. All rights reserved.
引用
收藏
页码:1267 / 1274
页数:8
相关论文
共 50 条
  • [21] Clustering XML Documents based on Data Type
    Zhou, Chong
    Lu, Yansheng
    2008 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, VOLS 1 AND 2, PROCEEDINGS, 2008, : 685 - 690
  • [22] An efficient and scalable algorithm for clustering XML documents by structure
    Lian, W
    Cheung, DWL
    Mamoulis, N
    Yiu, SM
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (01) : 82 - 96
  • [23] XML clustering based on common neighbor
    Lv, TY
    Zhang, XZ
    Zuo, WL
    Wang, ZX
    ADVANCED WEB AND NETWORK TECHNOLOGIES, AND APPLICATIONS, PROCEEDINGS, 2006, 3842 : 137 - 141
  • [24] XML Document Clustering Based on Common Tag Names Anywhere in the Structure
    Alishahi, Mohamad
    Ravakhah, Mehdi
    Shakeriaski, Baharak
    Naghibzade, Mahmud
    2009 14TH INTERNATIONAL COMPUTER CONFERENCE, 2009, : 587 - +
  • [25] Clustering schemaless XML documents
    Shen, Y
    Wang, B
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2003: COOPIS, DOA, AND ODBASE, 2003, 2888 : 767 - 784
  • [26] XML documents clustering by structures
    Nayak, Richi
    Xu, Sumei
    ADVANCES IN XML INFORMATION RETRIEVAL AND EVALUATION, 2006, 3977 : 432 - 442
  • [27] Semantic Clustering of XML Documents
    Tagarelli, Andrea
    Greco, Sergio
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2010, 28 (01)
  • [28] Collaborative clustering of XML documents
    Greco, Sergio
    Gullo, Francesco
    Ponti, Giovanni
    Tagarelli, Andrea
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2011, 77 (06) : 988 - 1008
  • [29] Clustering XML documents by patterns
    Piernik, Maciej
    Brzezinski, Dariusz
    Morzy, Tadeusz
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 46 (01) : 185 - 212
  • [30] Collaborative Clustering of XML Documents
    Greco, Sergio
    Gullo, Francesco
    Ponti, Giovanni
    Tagarelli, Andrea
    2009 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW 2009), 2009, : 579 - 586