PROBABILISTIC HEURISTICS FOR HIERARCHICAL WEB DATA CLUSTERING

被引:2
|
作者
Chehreghani, Morteza Haghir [1 ]
Chehreghani, Mostafa Haghir [1 ]
Abolhassani, Hassan [1 ]
机构
[1] Sharif Univ Technol, Fac Comp Engn, Web Intelligence Lab, Dept Comp Engn, Tehran, Iran
关键词
data mining; Web clustering; Bayesian networks; hierarchical clustering; representative point;
D O I
10.1111/j.1467-8640.2012.00414.x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering Web data is one important technique for extracting knowledge from the Web. In this paper, a novel method is presented to facilitate the clustering. The method determines the appropriate number of clusters and provides suitable representatives for each cluster by inference from a Bayesian network. Furthermore, by means of the Bayesian network, the contents of the Web pages are converted into vectors of lower dimensions. The method is also extended for hierarchical clustering, and a useful heuristic is developed to select a good hierarchy. The experimental results show that the clusters produced benefit from high quality.
引用
收藏
页码:209 / 233
页数:25
相关论文
共 50 条
  • [21] Probabilistic Web Data Management
    Chen, Lei
    Ilyas, Ihab
    Re, Christopher
    Zhou, Xiaofang
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2013, 16 (03): : 271 - 272
  • [22] A Hierarchical Algorithm for Clustering Extremist Web Pages
    Qi, Xingqin
    Christensen, Kyle
    Duval, Robert
    Fuller, Edgar
    Spahiu, Arian
    Wu, Qin
    Zhang, Cun-Quan
    2010 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2010), 2010, : 458 - 463
  • [23] Data Clustering Using Grouping Hyper-heuristics
    Elhag, Anas
    Ozcan, Ender
    EVOLUTIONARY COMPUTATION IN COMBINATORIAL OPTIMIZATION, EVOCOP 2018, 2018, 10782 : 101 - 115
  • [24] Efficiently Clustering Probabilistic Data Streams
    Zhang, Chen
    Jin, Cheqing
    Zhou, Aoying
    ADVANCES IN DATA AND WEB MANAGEMENT, PROCEEDINGS, 2009, 5446 : 273 - +
  • [25] Symbolic clustering of constrained probabilistic data
    Brito, P
    de Carvalho, FAT
    EXPLORATORY DATA ANALYSIS IN EMPIRICAL RESEARCH, PROCEEDINGS, 2003, : 12 - 21
  • [26] Data warehouse clustering on the web
    Triantafillakis, A
    Kanellis, P
    Martakos, D
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2005, 160 (02) : 353 - 364
  • [27] Clustering web surfers with probabilistic models in a real application
    Liu, Y
    Huang, XJ
    An, AJ
    Promhouse, G
    IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 761 - 765
  • [28] Data warehouse clustering on the web
    Triantafillakis, A
    Kanellis, P
    Martakos, D
    13TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2002, : 800 - 804
  • [29] Heuristics and meta-heuristics for one-way clustering of gene expression data
    Abdullah, A
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 234 - 239
  • [30] Superficial white matter bundle atlas based on hierarchical fiber clustering over probabilistic tractography data
    Roman, Claudio
    Hernandez, Cecilia
    Figueroa, Miguel
    Houenou, Josselin
    Poupon, Cyril
    Mangin, Jean-Francois
    Guevara, Pamela
    NEUROIMAGE, 2022, 262