A novel method for mining frequent subtrees from XML data

被引:0
|
作者
Zhang, WS [1 ]
Liu, DX [1 ]
Zhang, JP [1 ]
机构
[1] Harbin Engn Univ, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we focus on the problem of finding frequent subtrees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees. We present an efficient algorithm RSTMiner that computes all rooted subtrees appearing in a collection of XML data trees with frequent above a user-specified threshold using a special structure Me-tree. In this algorithm, Me-tree is used as a merging tree to supply scheme information for efficient pruning and mining frequent sub-trees. The keys of the algorithm are efficient pruning candidates with Me-Tree structure and incrementally enumerating all rooted sub-trees in canonical form based on a extended right most expansion technique. Experiment results show that RSTMiner algorithm is efficient and scalable.
引用
收藏
页码:300 / 305
页数:6
相关论文
共 50 条
  • [1] Mining frequent rooted subtrees in XML data with Me-tree
    Zhang, WS
    Liu, DX
    Zhang, JP
    2004 IEEE SYSTEMS & INFORMATION ENGINEERING DESIGN SYMPOSIUM, 2004, : 209 - 214
  • [3] Discovering frequent subtrees from XML data using neural networks
    College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
    Wuhan Univ J Nat Sci, 2006, 1 (117-121):
  • [4] Efficient data mining for maximal frequent subtrees
    Xiao, YQ
    Yao, JF
    Li, ZG
    Dunham, MH
    THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 379 - +
  • [5] Mining frequent patterns from XML data
    Win, Chit Nilar
    Hla, Khin Haymar Saw
    APSITT 2005: 6th Asia-Pacific Symposium on Information and Telecommunication Technologies, Proceedings, 2005, : 208 - 212
  • [6] Mining subtrees with frequent occurrence of similar subtrees
    Tosaka, Hisashi
    Nakamura, Atsuyoshi
    Kudo, Mineichi
    DISCOVERY SCIENCE, PROCEEDINGS, 2007, 4755 : 286 - +
  • [7] EXiT-B: A new approach for extracting maximal frequent subtrees from XML data
    Paik, J
    Won, D
    Fotouhi, F
    Kim, UM
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 1 - 8
  • [8] Clustering XML Documents Using Frequent Subtrees
    Kutty, Sangeetha
    Tran, Tien
    Nayak, Richi
    Li, Yuefeng
    ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 : 436 - 445
  • [9] Mining frequent patterns from Xml data based on vertical data
    Dai, Shangping
    Xie, Xiangming
    He, Tian
    DCABES 2007 PROCEEDINGS, VOLS I AND II, 2007, : 798 - 800
  • [10] Mining Compressed Frequent Subtrees Set
    ZHAO Chuanshen1
    2. School of Computer Science and Engineering
    3. Department of Computer
    Wuhan University Journal of Natural Sciences, 2009, 14 (01) : 29 - 34