A tree-based approach to clustering XML documents by structure

被引:0
|
作者
Costa, G
Manco, G
Ortale, R
Tagarelli, A
机构
[1] Inst Italian Natl Res Council, CNR, ICAR, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Calabria, DEIS, I-87036 Arcavacata Di Rende, CS, Italy
来源
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2004, PROCEEDINGS | 2004年 / 3202卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by comparing cluster representatives, and updating the representatives as soon as new clusters are detected. We present an algorithm for the computation of an XML representative based on suitable techniques for identifying significant node matchings and for reliably merging and pruning XML trees. Experimental evaluation performed on both synthetic and real data shows the effectiveness of our approach.
引用
收藏
页码:137 / 148
页数:12
相关论文
共 50 条
  • [41] Clustering XML Documents Using Structure and Content based on a New Similarity Function OverallSimSUX
    Magdaleno, Damny
    Fuentes, Vett E.
    Garcia, Maria M.
    COMPUTACION Y SISTEMAS, 2015, 19 (01): : 151 - 161
  • [42] Tree-based text chat using XML-based messages
    Kim, K
    IC'04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET COMPUTING, VOLS 1 AND 2, 2004, : 669 - 675
  • [43] Fast Tree-Based Classification via Homogeneous Clustering
    Pardis, George
    Diamantaras, Konstantinos I.
    Ougiaroglou, Stefanos
    Evangelidis, Georgios
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2019, PT I, 2019, 11871 : 514 - 524
  • [44] Tree-Based Algorithm for Stable and Efficient Data Clustering
    Aljabbouli, Hasan
    Albizri, Abdullah
    Harfouche, Antoine
    INFORMATICS-BASEL, 2020, 7 (04):
  • [45] The unreasonable effectiveness of tree-based theory for networks with clustering
    Melnik, Sergey
    Hackett, Adam
    Porter, Mason A.
    Mucha, Peter J.
    Gleeson, James P.
    PHYSICAL REVIEW E, 2011, 83 (03)
  • [46] treeClust: An R Package for Tree-Based Clustering Dissimilarities
    Buttrey, Samuel E.
    Whitaker, Lyn R.
    R JOURNAL, 2015, 7 (02): : 227 - 236
  • [47] XML Documents Clustering Algorithm Based on Cluster Core And LSPX
    Zhao, Di
    Fu, HaiDong
    Ren, Hui
    Wei, Mengxue
    Chu, Jie
    PROCEEDINGS OF THE 2017 12TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2017, : 1027 - 1032
  • [48] Algorithms for Clustering XML Documents: A Review
    Gulati, Shagun
    Munjal, Geetika
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND APPLICATIONS (ICACEA), 2015, : 654 - 658
  • [49] Clustering large scale of XML documents
    Wang, Tong
    Liu, Da-Xin
    Lin, Xuan-Zuo
    Sun, Wei
    Ahmad, Gufran
    ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2006, 3947 : 447 - 455
  • [50] A robust clustering method for XML documents
    Zhao, Bin
    Zhang, Yong-Sheng
    Zhang, Hua-Xiang
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING, VOL 1, 2008, : 19 - 23