XML clustering by principal component analysis

被引:0
|
作者
Liu, JH [1 ]
Wang, JTL [1 ]
Hsu, W [1 ]
Herbert, KG [1 ]
机构
[1] New Jersey Inst Technol, Coll Comp Sci, Newark, NJ 07102 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML is increasingly important in data exchange and information management. A large amount of efforts have been spent in developing efficient techniques for storing, querying, indexing and accessing XML documents. In this paper we propose a new approach to clustering XML data. In contrast to previous work, which focused on documents defined by different DTDs, the proposed method works for documents with the same DTD. Our approach is to extract features from documents, modeled by ordered labeled trees, and transform the documents to vectors in a high-dimensional Euclidean space based on the occurrences of the features in the documents. We then reduce the dimensionality of the vectors by principal component analysis (PCA) and cluster the vectors in the reduced dimensional space. The PCA enables one to identify vectors with co-occurrent features, thereby enhancing the accuracy of the clustering. Experimental results based on documents obtained from Wisconsin's XML data bank show the effectiveness and good performance of the proposed techniques.
引用
收藏
页码:658 / 662
页数:5
相关论文
共 50 条
  • [11] A random version of principal component analysis in data clustering
    Palese, Luigi Leonardo
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2018, 73 : 57 - 64
  • [12] Distributed Clustering Using Collective Principal Component Analysis
    Hillol Kargupta
    Weiyun Huang
    Krishnamoorthy Sivakumar
    Erik Johnson
    Knowledge and Information Systems, 2001, 3 (4) : 422 - 448
  • [13] Samples clustering and recognition with fuzzy clustering and principal component analysis method in spectral analysis
    Chu, XL
    Yuan, HF
    Lu, WZ
    CHINESE JOURNAL OF ANALYTICAL CHEMISTRY, 2000, 28 (04) : 421 - 427
  • [14] Local independent component analysis with fuzzy clustering and regression-principal component analysis
    Maenaka, Tatsuya
    Honda, Katsuhiro
    Ichihashi, Hidetomo
    2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2006, : 857 - +
  • [15] Analysis of breast cancer progression using principal component analysis and clustering
    G. Alexe
    G. S. Dalgin
    S. Ganesan
    C. DeLisi
    G. Bhanot
    Journal of Biosciences, 2007, 32 : 1027 - 1039
  • [16] Analysis of breast cancer progression using principal component analysis and clustering
    Alexe, G.
    Dalgin, G. S.
    Ganesan, S.
    Delisi, C.
    Bhanot, G.
    JOURNAL OF BIOSCIENCES, 2007, 32 (05) : 1027 - 1039
  • [17] Time Series Clustering Method Based on Principal Component Analysis
    Cao, Danyang
    Tian, Yuan
    Bai, Donghui
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING FOR MECHANICS AND MATERIALS, 2015, 21 : 888 - 895
  • [18] Clustering and feature selection using sparse principal component analysis
    Ronny Luss
    Alexandre d’Aspremont
    Optimization and Engineering, 2010, 11 : 145 - 157
  • [19] Principal component analysis and effective K-means clustering
    Ding, C
    He, XF
    PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2004, : 497 - 501
  • [20] Dissimilarity Based Principal Component Analysis Using Fuzzy Clustering
    Sato-Ilic, Mika
    INTEGRATED UNCERTAINTY MANAGEMENT AND APPLICATIONS, 2010, 68 : 453 - 464