Accelerating the process of web page segmentation via template clustering

被引:0
|
作者
Zeleny J. [1 ]
Burget R. [2 ]
机构
[1] Faculty of Information Technology, Brno University of Technology, Brno
[2] Faculty of Information Technology, Brno University of Technology, IT4Innovations Centre of Excellence, Brno
关键词
Clustering; Page segmentation; Segmentation performance; Template; Template detection; VIPS; Vision-based page segmentation; Web page preprocessing; Web page segmentation;
D O I
10.1504/IJIIDS.2016.075424
中图分类号
学科分类号
摘要
Page segmentation is often one of the initial steps when performing data mining on a web page. In the past years, several methods of page segmentation have been developed that are based on visual perception of the web page. In this paper, we propose a generic method for improving efficiency of virtually all vision-based segmentation algorithms. Our method called cluster-based page segmentation takes the widely spread concept of web templates and utilises it for improving the efficiency of vision-based page segmentation by clustering web pages and performing the segmentation on the clusters instead of each page in the cluster. To prove the efficiency of our algorithm, we offer experimental results gathered using three different vision-based segmentation algorithms. Copyright © 2016 Inderscience Enterprises Ltd.
引用
收藏
页码:134 / 154
页数:20
相关论文
共 50 条
  • [31] Web Page Rank Prediction with PCA and EM Clustering
    Zacharouli, Polyxeni
    Titsias, Michalis
    Vazirgiannis, Michalis
    ALGORITHMS AND MODELS FOR THE WEB-GRAPH, PROCEEDINGS, 2009, 5427 : 104 - +
  • [32] Web Page Template Design Using Interactive Genetic Algorithm
    Sorn, Davy
    Rimcharoen, Sunisa
    2013 INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC), 2013, : 201 - 206
  • [33] BRWM: A relevance feedback mechanism for web page clustering
    Anagnostopoulos, Ioannis
    Anagnostopoulos, Christos
    Vergados, Dimitrios D.
    Maglogiannis, Ilias
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, 2006, 204 : 44 - +
  • [34] Web page clustering using Harmony Search optimization
    Forsati, Rana
    Mahdavi, Mehrdad
    Kangavari, Mohammadreza
    Safarkhani, Banafsheh
    2008 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-4, 2008, : 1530 - +
  • [35] A model of web page clustering using artificial ants
    Su, Yidan
    Dai, Shengxian
    Gu, Xinyi
    2005 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND TECHNOLOGY, PROCEEDINGS, 2005, : 206 - 210
  • [36] Enhancing an Incremental Clustering Algorithm for Web Page Collections
    Shaw, Gavin
    Xu, Yue
    2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 81 - 84
  • [37] Web Page Segmentation Revisited: Evaluation Framework and Dataset
    Kiesel, Johannes
    Kneist, Florian
    Meyer, Lars
    Komlossy, Kristof
    Stein, Benno
    Potthast, Martin
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3047 - 3054
  • [38] Analysis of web page complexity through visual segmentation
    Song, Guangfeng
    HUMAN-COMPUTER INTERACTION, PT 4, PROCEEDINGS: HCI APPLICATIONS AND SERVICES, 2007, 4553 : 114 - 123
  • [39] Research of Information Retrieval Based on Web Page Segmentation
    Yu, Yangxin
    PROGRESS IN INDUSTRIAL AND CIVIL ENGINEERING, PTS. 1-5, 2012, 204-208 : 4928 - 4931
  • [40] A Quantitative Comparison of Semantic Web Page Segmentation Approaches
    Kreuzer, Robert
    Hage, Jurriaan
    Feelders, Ad
    ENGINEERING THE WEB IN THE BIG DATA ERA, 2015, 9114 : 374 - 391