Query directed web page clustering

被引:9
|
作者
Crabtree, Daniel [1 ]
Andreae, Peter [1 ]
Gao, Xiaoying [1 ]
机构
[1] Victoria Univ Wellington, Sch Math Stat & Comp Sci, Wellington, New Zealand
关键词
D O I
10.1109/WI.2006.142
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web page clustering methods categorize and organize search results into semantically meaningful clusters that assist users with search refinement; but finding clusters that are semantically meaningful to users is difficult. In this paper we describe a new web page clustering algorithm, QDC, which uses the user's query as part of a reliable measure of cluster quality. The new algorithm has five key innovations: a new query directed cluster quality guide that uses the relationship between clusters and the query, an improved cluster merging method that generates semantically coherent clusters by using cluster description similarity in additional to cluster overlap, a new cluster splitting method that fixes the cluster chaining or cluster drifting problem, an improved heuristic for cluster selection that uses the query directed cluster quality guide, and a new method of improving clusters by ranking the pages by relevance to the cluster We evaluate QDC by comparing its clustering performance against that of four other algorithms on eight data sets (four use full text data and four use snippet data) by using eleven different external evaluation measurements. We also evaluate QDC by informally analysing its real world usability and performance through comparison with six other algorithms on four data sets. QDC provides a substantial performance improvement over other web page clustering algorithms.
引用
收藏
页码:202 / +
页数:2
相关论文
共 50 条
  • [21] Enhancing an Incremental Clustering Algorithm for Web Page Collections
    Shaw, Gavin
    Xu, Yue
    2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 81 - 84
  • [22] Web Page Prediction by Clustering and Integrated Distance Measure
    Poornalatha, G.
    Raghavendra, Prakash S.
    2012 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2012, : 1349 - 1354
  • [23] A feature reduction technique for improved web page clustering
    Mohamed, Ehab Abdel-Hamid
    El-Beltagy, Samhaa R.
    El-Gamal, Salwa
    2006 INNOVATIONS IN INFORMATION TECHNOLOGY, 2006, : 280 - +
  • [24] A New Clustering Algorithm for Deep Web Query Interfaces
    Chao, L. V.
    Lin Peiguang
    Nie Peiyao
    ITESS: 2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES, PT 2, 2008, : 661 - 668
  • [25] AN EFFECTIVE CLUSTERING APPROACH TO WEB QUERY LOG ANONYMIZATION
    Fard, Amin Milani
    Wang, Ke
    SECRYPT 2010: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, 2010, : 109 - 119
  • [26] Information navigation on the web by clustering and summarizing query results
    Roussinov, DG
    Chen, HC
    INFORMATION PROCESSING & MANAGEMENT, 2001, 37 (06) : 789 - 816
  • [27] Query Log Driven Web Search Results Clustering
    Moreno, Jose G.
    Dias, Gael
    Cleuziou, Guillaume
    SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 777 - 786
  • [28] Hierarchical Web-Page Clustering via In-Page and Cross-Page Link Structures
    Lin, Cindy Xide
    Yu, Yintao
    Han, Jiawei
    Liu, Bing
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 : 222 - +
  • [29] Web page sorting algorithm based on query keyword distance relation
    Yang, Han
    Cui, HongGang
    Tang, Hao
    GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
  • [30] Query recommendation using large-scale web access logs and web page archive
    Li, Lin
    Otsuka, Shingo
    Kitsuregawa, Masaru
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2008, 5181 : 134 - +