Mining evolutionary events from multi-streams based on spectral clustering

被引:1
|
作者
Yang N. [1 ]
Tang C.-J. [1 ]
Wang Y. [1 ]
Chen Y. [1 ]
Zheng J.-L. [1 ]
机构
[1] College of Computer Science, Sichuan University
来源
Ruan Jian Xue Bao/Journal of Software | 2010年 / 21卷 / 10期
关键词
Evolutionary event; Matrix perturbation; Multi-streams; Spectral clustering;
D O I
10.3724/SP.J.1001.2010.03745
中图分类号
学科分类号
摘要
To solve the problem of mining evolutionary events from multi-streams, this paper proposes a spectral clustering algorithm, SCAM (spectral clustering algorithm of multi-streams), to generate the clustering models of Multi-Streams. The similarity matrix in the clustering models of Multi-Streams are based on Coupling Degree, which measures the dynamic similarity between two streams. In addition, this paper also proposes an algorithm, EEMA (evolutionary events mining algorithm), to discover the evolutionary event points based on the drift of clustering models. EEMA takes the index of Clustering Model Quality as the optimization objective in determing the number of clusters automatically. The Clustering Model Quality combines the matrix perturbation theory and the Clustering Cohesion, which has a sound upper bound and is used to measure the compactness of a clustering model. Finally, this paper presents O-EEMA (optimized-EEMA) as the optimization of EEMA with the temporal complexity of O(cn2/2), and the results of extensive experiments on the synthetic and real data set show that EEMA and O-EEMA are effective and practicable. © by Institute of Software, the Chinese Academy of Sciences.
引用
收藏
页码:2395 / 2409
页数:14
相关论文
共 29 条
  • [1] Chuck C., Theodore J., Oliver S., Vladislav S., Gigascope: A stream database for network applications, Proc. of the ACM SIGMOD 2003, pp. 647-651, (2003)
  • [2] Johannes G., Samuel M., Query processing in sensor networks, IEEE Pervasive Computing, 3, 1, pp. 46-55, (2004)
  • [3] Charu C.A., A framework for diagnosing changes in evolving data streams, Proc. of the ACM SIGMOD 2003, pp. 575-586, (2003)
  • [4] Yunyue Z., Dennis S., StatStream: Statistical monitoring of thousands of data streams in real time, Proc. of the 28th Int'l Conf. on Very Large Data Bases (VLDB 2002), pp. 358-369, (2002)
  • [5] Yunyue Z., Dennis S., Efficient elastic burst detection in data streams, Proc. of the 9th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (SIGKDD 2003), pp. 336-345, (2003)
  • [6] Golab L., TamerOzsu M., Issues in data stream management, ACM SIGMOD Record, 32, 2, pp. 5-14, (2003)
  • [7] Guha S., Mishra N., Motwani R., O'Callaghan L., Clustering data streams: Theory and practice, IEEE Trans. on Knowledge and Data Engineering, 15, 3, pp. 515-528, (2003)
  • [8] O'Callaghan L., Mishra N., Meyerson A., Guha S., Motwani R., Streaming-Data algorithms for high-quality clustering, Proc. of the 18th Int'l Conf. on Data Engineering (ICDE 2008), pp. 685-694, (2002)
  • [9] Aggarwal C.C., Han J., Wang J., Yu P.S., A framework for clustering evolving data streams, Proc. of the 29th Int'l Conf. on Very Large Data Bases (VLDB 2003), pp. 81-92, (2003)
  • [10] Deepayan C., Ravi K., Andrew T., Evolutionary clustering, Proc. of the 12th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2006), pp. 554-560, (2006)