A tree-based approach to clustering XML documents by structure

被引:0
|
作者
Costa, G
Manco, G
Ortale, R
Tagarelli, A
机构
[1] Inst Italian Natl Res Council, CNR, ICAR, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Calabria, DEIS, I-87036 Arcavacata Di Rende, CS, Italy
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by comparing cluster representatives, and updating the representatives as soon as new clusters are detected. We present an algorithm for the computation of an XML representative based on suitable techniques for identifying significant node matchings and for reliably merging and pruning XML trees. Experimental evaluation performed on both synthetic and real data shows the effectiveness of our approach.
引用
收藏
页码:137 / 148
页数:12
相关论文
共 50 条
  • [31] Collaborative Clustering of XML Documents
    Greco, Sergio
    Gullo, Francesco
    Ponti, Giovanni
    Tagarelli, Andrea
    2009 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW 2009), 2009, : 579 - 586
  • [32] Multisets and clustering XML documents
    Iyer, Swami
    Simovici, Dan A.
    19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL I, PROCEEDINGS, 2007, : 267 - 274
  • [33] Clustering XML documents by patterns
    Maciej Piernik
    Dariusz Brzezinski
    Tadeusz Morzy
    Knowledge and Information Systems, 2016, 46 : 185 - 212
  • [34] An Efficient Association Rule Based Clustering of XML Documents
    Muralidhar, A.
    Pattabiraman, V.
    BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 : 401 - 407
  • [35] Clustering Algorithm Based on Semantic Distance for XML Documents
    Yang, Lingxian
    Gu, Jinguang
    Chen, Heping
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 549 - +
  • [36] A Self-Organising Map approach for clustering of XML documents
    Trentimi, F.
    Hagenbuchner, M.
    Sperduti, A.
    Scarselli, F.
    Tsoi, A. C.
    2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 1805 - +
  • [37] Structural-based Clustering Technique OF XML Documents
    Posonia, Mary A.
    Jyothi, V. L.
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON CIRCUITS, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2013), 2013, : 1239 - 1242
  • [38] A structure preserving approach for securing XML documents
    Nabeel, Mohamed
    Bertino, Elisa
    2007 INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, 2008, : 8 - 15
  • [39] Structural- Based clustering technique of XML documents
    Mary Posonia, A.
    Jyothi, V.L.
    Proceedings of IEEE International Conference on Circuit, Power and Computing Technologies, ICCPCT 2013, 2013, : 1239 - 1242
  • [40] All common embedded subtrees for clustering XML documents by structure
    Lin, Zhiwei
    Wang, Hui
    McClean, Sally
    Wang, Haiying
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 13 - 18