Factors affecting web page similarity

被引:0
|
作者
Tombros, A [1 ]
Ali, ZS [1 ]
机构
[1] Queen Mary Univ London, Dept Comp Sci, London E1 4NS, England
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tools that allow effective information organisation, access and navigation are becoming increasingly important on the Web. Similarity between web pages is a concept that is central to such tools. In this paper, we examine the effect that content and layout-related aspects of web pages have on web page similarity. We consider the textual content contained within common HTML tags, the structural layout of pages, and the query terms contained within pages. Our study shows that combinations of factors can yield more promising results than individual factors, and that different aspects of web pages affect similarities between pages in a different manner. We found a number of factors that, when taken into account, can result in effective measures of similarity between web pages. Query information in particular, proved to be important for the effective organisation of web pages.
引用
收藏
页码:487 / 501
页数:15
相关论文
共 50 条
  • [1] Clustering web sessions by levels of page similarity
    Nichele, Caren Moraes
    Becker, Karin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 346 - 350
  • [2] Visual similarity comparison for Web page retrieval
    Takama, Y
    Mitsuhashi, N
    2005 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2005, : 301 - 304
  • [3] Personalized web page ranking using trust and similarity
    Srour, Lara
    Kayssi, Ayman
    Chehab, Ali
    2007 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2007, : 454 - +
  • [4] Human factors for web page design
    Billard, T
    SOCIETY FOR TECHNICAL COMMUNICATION 44TH ANNUAL CONFERENCE, 1997 PROCEEDINGS, 1997, : 322 - 325
  • [5] Measuring Web Page Similarity Based on Textual and Visual Properties
    Bartik, Vladimir
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 13 - 21
  • [6] Layout-based computation of web page similarity ranks
    Bozkir, Ahmet Selman
    Sezer, Ebru Akcapinar
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2018, 110 : 95 - 114
  • [7] Proposal of Seam Degree and Content Similarity for Web Page Segmentation
    Zeng, Jun
    Flanagan, Brendan
    Xiong, Qingyu
    Wen, Junhao
    Hirokawa, Sachio
    2013 SECOND IIAI INTERNATIONAL CONFERENCE ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2013), 2013, : 9 - 14
  • [8] Web Phishing Detection Based on Page Spatial Layout Similarity
    Zhang, Weifeng
    Lu, Hua
    Xu, Baowen
    Yang, Hongji
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2013, 37 (03): : 231 - 244
  • [9] Algorithm of Web Page Similarity Comparison Based on Visual Block
    Li, Xingchen
    Zhang, Weizhe
    Wang, Desheng
    Zhang, Bin
    He, Hui
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2019, 16 (03) : 815 - 830
  • [10] An Improved HITS Algorithm Based on Analysis of Web Page Links and Web Content Similarity
    Yang, Weiming
    2016 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW), 2016, : 147 - 150