XML clustering by principal component analysis

被引：0

作者：

Liu, JH ^{[1
]}

Wang, JTL ^{[1
]}

Hsu, W ^{[1
]}

Herbert, KG ^{[1
]}

机构：

[1] New Jersey Inst Technol, Coll Comp Sci, Newark, NJ 07102 USA

来源：

ICTAI 2004: 16TH IEEE INTERNATIONALCONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS | 2004年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

XML is increasingly important in data exchange and information management. A large amount of efforts have been spent in developing efficient techniques for storing, querying, indexing and accessing XML documents. In this paper we propose a new approach to clustering XML data. In contrast to previous work, which focused on documents defined by different DTDs, the proposed method works for documents with the same DTD. Our approach is to extract features from documents, modeled by ordered labeled trees, and transform the documents to vectors in a high-dimensional Euclidean space based on the occurrences of the features in the documents. We then reduce the dimensionality of the vectors by principal component analysis (PCA) and cluster the vectors in the reduced dimensional space. The PCA enables one to identify vectors with co-occurrent features, thereby enhancing the accuracy of the clustering. Experimental results based on documents obtained from Wisconsin's XML data bank show the effectiveness and good performance of the proposed techniques.

引用

页码：658 / 662

页数：5

共 50 条

[11] A random version of principal component analysis in data clustering
Palese, Luigi Leonardo
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2018, 73 : 57 - 64
[12] Distributed Clustering Using Collective Principal Component Analysis
Hillol Kargupta
Weiyun Huang
Krishnamoorthy Sivakumar
Erik Johnson
Knowledge and Information Systems, 2001, 3 (4) : 422 - 448
[13] Samples clustering and recognition with fuzzy clustering and principal component analysis method in spectral analysis
Chu, XL
Yuan, HF
Lu, WZ
CHINESE JOURNAL OF ANALYTICAL CHEMISTRY, 2000, 28 (04) : 421 - 427
[14] Local independent component analysis with fuzzy clustering and regression-principal component analysis
Maenaka, Tatsuya
Honda, Katsuhiro
Ichihashi, Hidetomo
2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2006, : 857 - +
[15] Analysis of breast cancer progression using principal component analysis and clustering
G. Alexe
G. S. Dalgin
S. Ganesan
C. DeLisi
G. Bhanot
Journal of Biosciences, 2007, 32 : 1027 - 1039
[16] Analysis of breast cancer progression using principal component analysis and clustering
Alexe, G.
Dalgin, G. S.
Ganesan, S.
Delisi, C.
Bhanot, G.
JOURNAL OF BIOSCIENCES, 2007, 32 (05) : 1027 - 1039
[17] Time Series Clustering Method Based on Principal Component Analysis
Cao, Danyang
Tian, Yuan
Bai, Donghui
PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING FOR MECHANICS AND MATERIALS, 2015, 21 : 888 - 895
[18] Clustering and feature selection using sparse principal component analysis
Ronny Luss
Alexandre d’Aspremont
Optimization and Engineering, 2010, 11 : 145 - 157
[19] Principal component analysis and effective K-means clustering
Ding, C
He, XF
PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2004, : 497 - 501
[20] Dissimilarity Based Principal Component Analysis Using Fuzzy Clustering
Sato-Ilic, Mika
INTEGRATED UNCERTAINTY MANAGEMENT AND APPLICATIONS, 2010, 68 : 453 - 464

← 1 2 3 4 5 →