Extended VSM for XML Document Classification Using Frequent Subtrees

被引:0
|
作者
Yang, Jianwu [1 ]
Wang, Songlin [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Tech, Beijing 100871, Peoples R China
来源
关键词
XML Document; Classification; Vector Space Model (VSM); Structured Link Vector Model (SLVM); Frequent Subtree;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Structured link vector model (SLVM) is a representation proposed for modeling XML documents which was extended from the conventional vector space model (VSM) by incorporating document structures In this paper we describe the classification approach for XML documents based on SLVM in the Document Mining Challenge of INEX 2009 where the closed frequent subtrees as structural units are used for content extraction from the XML document and the Chi-square test is used for feature selection
引用
收藏
页码:441 / 448
页数:8
相关论文
共 50 条
  • [41] An efficient algorithm for mining both closed and maximal frequent free subtrees using canonical forms
    Guo, P
    Zhou, Y
    Zhuang, J
    Chen, T
    Kang, YR
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 96 - 107
  • [42] Mining closed and maximal frequent embedded subtrees using length-decreasing support constraint
    Ji, Gen-Lin
    Zhu, Ying-Wen
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 268 - 273
  • [43] XML document summarization: Using XQuery for synopsis creation
    Comai, S
    Marrara, S
    Tanca, L
    15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 928 - 932
  • [44] An Improved XML Document Clustering Using Path Feature
    Yuan, Jin-sha
    Li, Xin-ye
    Ma, Li-na
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 400 - +
  • [45] Using rich document representation in XML information retrieval
    Raja, Fahimeh
    Keikha, Mostafa
    Rahgozar, Masued
    Oroumchian, Farhad
    COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 294 - 301
  • [46] Bilingual legal document retrieval and management using XML
    Luk, RWP
    T'sou, BKY
    Lai, TBY
    Kwong, OOY
    Chik, FCY
    Cheung, LYL
    SOFTWARE-PRACTICE & EXPERIENCE, 2003, 33 (01): : 41 - 59
  • [47] Web dual watermarking technology using an XML document
    Jin, C.
    Qu, Z. -G.
    Zhang, Z. -M.
    Jiang, Y.
    IET INFORMATION SECURITY, 2007, 1 (01) : 37 - 42
  • [48] XML query processing using document type definitions
    Chung, TS
    Kim, HJ
    JOURNAL OF SYSTEMS AND SOFTWARE, 2002, 64 (03) : 195 - 205
  • [49] Computing repairs for inconsistent XML document using chase
    Tan, Zijing
    Zhang, Zijun
    Wang, Wei
    Shi, Baile
    ADVANCES IN DATA AND WEB MANAGEMENT, PROCEEDINGS, 2007, 4505 : 293 - +
  • [50] Caching system for XML queries using frequent query patterns
    Bei, Yijun
    Chen, Gang
    Hu, Tianlei
    Dong, Jinxiang
    PROCEEDINGS OF THE 2007 11TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, VOLS 1 AND 2, 2007, : 47 - +