Query directed web page clustering

被引:9
|
作者
Crabtree, Daniel [1 ]
Andreae, Peter [1 ]
Gao, Xiaoying [1 ]
机构
[1] Victoria Univ Wellington, Sch Math Stat & Comp Sci, Wellington, New Zealand
关键词
D O I
10.1109/WI.2006.142
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web page clustering methods categorize and organize search results into semantically meaningful clusters that assist users with search refinement; but finding clusters that are semantically meaningful to users is difficult. In this paper we describe a new web page clustering algorithm, QDC, which uses the user's query as part of a reliable measure of cluster quality. The new algorithm has five key innovations: a new query directed cluster quality guide that uses the relationship between clusters and the query, an improved cluster merging method that generates semantically coherent clusters by using cluster description similarity in additional to cluster overlap, a new cluster splitting method that fixes the cluster chaining or cluster drifting problem, an improved heuristic for cluster selection that uses the query directed cluster quality guide, and a new method of improving clusters by ranking the pages by relevance to the cluster We evaluate QDC by comparing its clustering performance against that of four other algorithms on eight data sets (four use full text data and four use snippet data) by using eleven different external evaluation measurements. We also evaluate QDC by informally analysing its real world usability and performance through comparison with six other algorithms on four data sets. QDC provides a substantial performance improvement over other web page clustering algorithms.
引用
收藏
页码:202 / +
页数:2
相关论文
共 50 条
  • [41] Clustering-Aided Page Object Generation for Web Testing
    Stocco, Andrea
    Leotta, Maurizio
    Ricca, Filippo
    Tonella, Paolo
    WEB ENGINEERING (ICWE 2016), 2016, 9671 : 132 - 151
  • [42] Web Page Clustering for More Efficient Website Accessibility Evaluations
    Mucha, Justyna
    Snaprud, Mikael
    Nietzio, Annika
    COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, ICCHP 2016, PT I, 2016, 9758 : 259 - 266
  • [43] Web Page Clustering via Partition Adaptive Affinity Propagation
    Sun, Changyin
    Wang, Yifan
    Zhao, Haina
    ADVANCES IN NEURAL NETWORKS - ISNN 2009, PT 2, PROCEEDINGS, 2009, 5552 : 727 - 736
  • [44] Fast multi-word clustering algorithm of Web page
    Li, Zhenxing
    Xu, Zeping
    Tang, Weiqing
    Tang, Rongxi
    Jisuanji Gongcheng/Computer Engineering, 2003, 29 (02):
  • [45] Web Page Recommendation Algorithm based on Weighted MFP Clustering
    Xiong Haijun
    Huang Zhiqiang
    ICCSE 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION: ADVANCED COMPUTER TECHNOLOGY, NEW EDUCATION, 2008, : 1251 - 1253
  • [46] A Clustering Based Scalable Hybrid Approach for Web Page Recommendation
    Sharif, Mohammad Amir
    Raghavan, Vijay V.
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [47] Query-sensitive self-adaptable web page ranking algorithm
    Tao, WX
    Zuo, WL
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 413 - 418
  • [48] VisHue: Web Page Segmentation for an Improved Query Interface for MedlinePlus Medical Encyclopedia
    Madaan, Aastha
    Chu, Wanming
    Bhalla, Subhash
    DATABASES IN NETWORKED INFORMATION SYSTEMS, 2011, 7108 : 89 - 108
  • [49] Ranked Deep Web Page Detection Using Reinforcement Learning and Query Optimization
    Madan, Kapil
    Bhatia, Rajesh K.
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2021, 17 (04) : 99 - 121
  • [50] View-based web page retrieval using interactive sketch query
    Watai, Yasuyuki
    Yamasaki, Toshihiko
    Aizawa, Kiyoharu
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 3153 - +