Accelerating the process of web page segmentation via template clustering

被引:0
|
作者
Zeleny J. [1 ]
Burget R. [2 ]
机构
[1] Faculty of Information Technology, Brno University of Technology, Brno
[2] Faculty of Information Technology, Brno University of Technology, IT4Innovations Centre of Excellence, Brno
关键词
Clustering; Page segmentation; Segmentation performance; Template; Template detection; VIPS; Vision-based page segmentation; Web page preprocessing; Web page segmentation;
D O I
10.1504/IJIIDS.2016.075424
中图分类号
学科分类号
摘要
Page segmentation is often one of the initial steps when performing data mining on a web page. In the past years, several methods of page segmentation have been developed that are based on visual perception of the web page. In this paper, we propose a generic method for improving efficiency of virtually all vision-based segmentation algorithms. Our method called cluster-based page segmentation takes the widely spread concept of web templates and utilises it for improving the efficiency of vision-based page segmentation by clustering web pages and performing the segmentation on the clusters instead of each page in the cluster. To prove the efficiency of our algorithm, we offer experimental results gathered using three different vision-based segmentation algorithms. Copyright © 2016 Inderscience Enterprises Ltd.
引用
收藏
页码:134 / 154
页数:20
相关论文
共 50 条
  • [41] Web Page Prediction by Clustering and Integrated Distance Measure
    Poornalatha, G.
    Raghavendra, Prakash S.
    2012 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2012, : 1349 - 1354
  • [42] A feature reduction technique for improved web page clustering
    Mohamed, Ehab Abdel-Hamid
    El-Beltagy, Samhaa R.
    El-Gamal, Salwa
    2006 INNOVATIONS IN INFORMATION TECHNOLOGY, 2006, : 280 - +
  • [43] A Web Page Segmentation Approach Using Visual Semantics
    Zeng, Jun
    Flanagan, Brendan
    Hirokawa, Sachio
    Ito, Eisuke
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (02): : 223 - 230
  • [44] Web Page Segmentation Using Block Function Tree
    Orogat, Abdelghny
    Hemeda, Hamed
    Ahmed, M. T. Faheem Said
    7TH IEEE ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE IEEE IEMCON-2016, 2016,
  • [45] A web page segmentation algorithm for extracting product information
    Wu, Changjun
    Zeng, Guosun
    Xu, Guorong
    2006 IEEE INTERNATIONAL CONFERENCE ON INFORMATION ACQUISITION, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2006, : 1374 - 1379
  • [46] Term-based clustering and summarization of Web page collections
    Zhang, YZ
    Zincir-Heywood, N
    Milios, E
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2004, 3060 : 60 - 74
  • [47] An effective Web page recommender using binary data clustering
    Forsati, Rana
    Moayedikia, Alireza
    Shamsfard, Mehrnoush
    INFORMATION RETRIEVAL JOURNAL, 2015, 18 (03): : 167 - 214
  • [48] A Chinese Web Page Clustering Algorithm Based on the Suffix Tree
    YANG Jian-wu National Key Laboratory for Text Processing
    Wuhan University Journal of Natural Sciences, 2004, (05) : 817 - 822
  • [49] Application of layered clustering and plane partition in web page classification
    Wang, LX
    Han, JM
    Wei, Z
    Zhou, GC
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 2325 - 2330
  • [50] Clustering Web Page Sessions Using Sequence Alignment Method
    Poornalatha, G.
    Prakash, S. Raghavendra
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 479 - 483