PROBABILISTIC HEURISTICS FOR HIERARCHICAL WEB DATA CLUSTERING

被引:2
|
作者
Chehreghani, Morteza Haghir [1 ]
Chehreghani, Mostafa Haghir [1 ]
Abolhassani, Hassan [1 ]
机构
[1] Sharif Univ Technol, Fac Comp Engn, Web Intelligence Lab, Dept Comp Engn, Tehran, Iran
关键词
data mining; Web clustering; Bayesian networks; hierarchical clustering; representative point;
D O I
10.1111/j.1467-8640.2012.00414.x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering Web data is one important technique for extracting knowledge from the Web. In this paper, a novel method is presented to facilitate the clustering. The method determines the appropriate number of clusters and provides suitable representatives for each cluster by inference from a Bayesian network. Furthermore, by means of the Bayesian network, the contents of the Web pages are converted into vectors of lower dimensions. The method is also extended for hierarchical clustering, and a useful heuristic is developed to select a good hierarchy. The experimental results show that the clusters produced benefit from high quality.
引用
收藏
页码:209 / 233
页数:25
相关论文
共 50 条
  • [1] Web Data Extraction with Hierarchical Clustering and Rich Features
    Dong, Yongquan
    Zhao, Xiangjun
    Zhang, Gongjie
    RECENT TRENDS IN MATERIALS AND MECHANICAL ENGINEERING MATERIALS, MECHATRONICS AND AUTOMATION, PTS 1-3, 2011, 55-57 : 1003 - 1008
  • [2] Visual heuristics for data clustering
    TranLuu, TD
    DeClaris, N
    SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: CONFERENCE THEME: COMPUTATIONAL CYBERNETICS AND SIMULATION, 1997, : 19 - 24
  • [3] Fast approximate hierarchical clustering using similarity heuristics
    Meelis Kull
    Jaak Vilo
    BioData Mining, 1
  • [4] Fast approximate hierarchical clustering using similarity heuristics
    Kull, Meelis
    Vilo, Jaak
    BIODATA MINING, 2008, 1 (1)
  • [5] A hierarchical clustering approach to identify repeated enrollments in web survey data
    Handorf, Elizabeth A.
    Heckman, Carolyn J.
    Darlow, Susan
    Slifker, Michael
    Ritterband, Lee
    PLOS ONE, 2018, 13 (09):
  • [6] Hierarchical clustering algorithm for categorical data using a probabilistic rough set model
    Li, Min
    Deng, Shaobo
    Wang, Lei
    Feng, Shengzhong
    Fan, Jianping
    KNOWLEDGE-BASED SYSTEMS, 2014, 65 : 60 - 71
  • [7] Bidirectional hierarchical clustering for web mining
    Yao, ZM
    Choi, B
    IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 620 - 624
  • [8] Clustering Web Pages into Hierarchical Categories
    Yao, Zhongmei
    Choi, Ben
    INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2007, 3 (02) : 17 - 35
  • [9] Clustering Heterogeneous Web Usage Data Using Hierarchical Particle Swarm Optimization
    Alam, Shafiq
    Dobbie, Gillian
    Koh, Yun Sing
    Riddle, Patricia
    2013 IEEE SYMPOSIUM ON SWARM INTELLIGENCE (SIS), 2013, : 147 - 154
  • [10] Probabilistic clustering of interval data
    Brito, Paula
    Pedro Duarte Silva, A.
    Dias, Jose G.
    INTELLIGENT DATA ANALYSIS, 2015, 19 (02) : 293 - 313