Query directed web page clustering

被引:9
|
作者
Crabtree, Daniel [1 ]
Andreae, Peter [1 ]
Gao, Xiaoying [1 ]
机构
[1] Victoria Univ Wellington, Sch Math Stat & Comp Sci, Wellington, New Zealand
关键词
D O I
10.1109/WI.2006.142
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web page clustering methods categorize and organize search results into semantically meaningful clusters that assist users with search refinement; but finding clusters that are semantically meaningful to users is difficult. In this paper we describe a new web page clustering algorithm, QDC, which uses the user's query as part of a reliable measure of cluster quality. The new algorithm has five key innovations: a new query directed cluster quality guide that uses the relationship between clusters and the query, an improved cluster merging method that generates semantically coherent clusters by using cluster description similarity in additional to cluster overlap, a new cluster splitting method that fixes the cluster chaining or cluster drifting problem, an improved heuristic for cluster selection that uses the query directed cluster quality guide, and a new method of improving clusters by ranking the pages by relevance to the cluster We evaluate QDC by comparing its clustering performance against that of four other algorithms on eight data sets (four use full text data and four use snippet data) by using eleven different external evaluation measurements. We also evaluate QDC by informally analysing its real world usability and performance through comparison with six other algorithms on four data sets. QDC provides a substantial performance improvement over other web page clustering algorithms.
引用
收藏
页码:202 / +
页数:2
相关论文
共 50 条
  • [1] Query clustering for boosting web page ranking
    BaezaYates, R
    Hurtado, C
    Mendoza, M
    ADVANCES IN WEB INTELLIGENCE, PROCEEDINGS, 2004, 3034 : 164 - 175
  • [2] Query directed clustering
    Crabtree, Daniel
    Gao, Xiaoying
    Andreae, Peter
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 36 (03) : 693 - 729
  • [3] Query directed clustering
    Daniel Crabtree
    Xiaoying Gao
    Peter Andreae
    Knowledge and Information Systems, 2013, 36 : 693 - 729
  • [4] An Evolutionary Web Clustering for Web Page Predicting
    Wu, Rui
    Zhang, Ling
    JOURNAL OF INTERNET TECHNOLOGY, 2017, 18 (01): : 147 - 155
  • [5] Arabic Web page clustering: A review
    Alghamdi, Hanan M.
    Selamat, Ali
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2019, 31 (01) : 1 - 14
  • [6] Semantic Web and Web Page Clustering Algorithms: A Landscape View
    Obaid A.J.
    Chatterjee T.
    Bhattacharya A.
    Obaid, Ahmed J. (ahmedj.aljanaby@uokufa.edu.iq), 1600, European Alliance for Innovation (08): : 1 - 14
  • [7] Web Page Clustering using Heuristic Search in the Web Graph
    Bekkerman, Ron
    Zilberstein, Shlomo
    Allan, James
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2280 - 2285
  • [8] Improvement of web data clustering using web page contents
    Xu, Y
    Weng, LT
    INTELLIGENT INFORMATION PROCESSING II, 2005, 163 : 521 - 530
  • [9] Incremental document clustering for web page classification
    Wong, WC
    Fu, AWC
    ENABLING SOCIETY WITH INFORMATION TECHNOLOGY, 2002, : 101 - 110
  • [10] Clustering web sessions by levels of page similarity
    Nichele, Caren Moraes
    Becker, Karin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 346 - 350