Query directed web page clustering

被引:9
|
作者
Crabtree, Daniel [1 ]
Andreae, Peter [1 ]
Gao, Xiaoying [1 ]
机构
[1] Victoria Univ Wellington, Sch Math Stat & Comp Sci, Wellington, New Zealand
关键词
D O I
10.1109/WI.2006.142
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web page clustering methods categorize and organize search results into semantically meaningful clusters that assist users with search refinement; but finding clusters that are semantically meaningful to users is difficult. In this paper we describe a new web page clustering algorithm, QDC, which uses the user's query as part of a reliable measure of cluster quality. The new algorithm has five key innovations: a new query directed cluster quality guide that uses the relationship between clusters and the query, an improved cluster merging method that generates semantically coherent clusters by using cluster description similarity in additional to cluster overlap, a new cluster splitting method that fixes the cluster chaining or cluster drifting problem, an improved heuristic for cluster selection that uses the query directed cluster quality guide, and a new method of improving clusters by ranking the pages by relevance to the cluster We evaluate QDC by comparing its clustering performance against that of four other algorithms on eight data sets (four use full text data and four use snippet data) by using eleven different external evaluation measurements. We also evaluate QDC by informally analysing its real world usability and performance through comparison with six other algorithms on four data sets. QDC provides a substantial performance improvement over other web page clustering algorithms.
引用
收藏
页码:202 / +
页数:2
相关论文
共 50 条
  • [31] Term-based clustering and summarization of Web page collections
    Zhang, YZ
    Zincir-Heywood, N
    Milios, E
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2004, 3060 : 60 - 74
  • [32] Accelerating the process of web page segmentation via template clustering
    Zeleny J.
    Burget R.
    International Journal of Intelligent Information and Database Systems, 2016, 9 (02) : 134 - 154
  • [33] An effective Web page recommender using binary data clustering
    Forsati, Rana
    Moayedikia, Alireza
    Shamsfard, Mehrnoush
    INFORMATION RETRIEVAL JOURNAL, 2015, 18 (03): : 167 - 214
  • [34] A Chinese Web Page Clustering Algorithm Based on the Suffix Tree
    YANG Jian-wu National Key Laboratory for Text Processing
    Wuhan University Journal of Natural Sciences, 2004, (05) : 817 - 822
  • [35] Application of layered clustering and plane partition in web page classification
    Wang, LX
    Han, JM
    Wei, Z
    Zhou, GC
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 2325 - 2330
  • [36] Clustering Web Page Sessions Using Sequence Alignment Method
    Poornalatha, G.
    Prakash, S. Raghavendra
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 479 - 483
  • [37] A Web Page Clustering Method Based on Formal Concept Analysis
    Zhang, Zuping
    Zhao, Jing
    Yan, Xiping
    INFORMATION, 2018, 9 (09)
  • [38] An effective Web page recommender using binary data clustering
    Rana Forsati
    Alireza Moayedikia
    Mehrnoush Shamsfard
    Information Retrieval Journal, 2015, 18 : 167 - 214
  • [39] A Method of Automatic Web Information Extraction Based on Page Clustering
    Yang, Tianqi
    Qiu, Taofen
    2011 9TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2011), 2011, : 390 - 393
  • [40] A matrix approach for hierarchical web page clustering based on hyperlinks
    Hou, JY
    Zhang, YC
    WISE 2002: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING (WORKSHOPS), 2002, : 207 - 216