Accelerating the process of web page segmentation via template clustering

被引:0
|
作者
Zeleny J. [1 ]
Burget R. [2 ]
机构
[1] Faculty of Information Technology, Brno University of Technology, Brno
[2] Faculty of Information Technology, Brno University of Technology, IT4Innovations Centre of Excellence, Brno
关键词
Clustering; Page segmentation; Segmentation performance; Template; Template detection; VIPS; Vision-based page segmentation; Web page preprocessing; Web page segmentation;
D O I
10.1504/IJIIDS.2016.075424
中图分类号
学科分类号
摘要
Page segmentation is often one of the initial steps when performing data mining on a web page. In the past years, several methods of page segmentation have been developed that are based on visual perception of the web page. In this paper, we propose a generic method for improving efficiency of virtually all vision-based segmentation algorithms. Our method called cluster-based page segmentation takes the widely spread concept of web templates and utilises it for improving the efficiency of vision-based page segmentation by clustering web pages and performing the segmentation on the clusters instead of each page in the cluster. To prove the efficiency of our algorithm, we offer experimental results gathered using three different vision-based segmentation algorithms. Copyright © 2016 Inderscience Enterprises Ltd.
引用
收藏
页码:134 / 154
页数:20
相关论文
共 50 条
  • [1] Web Page Clustering via Partition Adaptive Affinity Propagation
    Sun, Changyin
    Wang, Yifan
    Zhao, Haina
    ADVANCES IN NEURAL NETWORKS - ISNN 2009, PT 2, PROCEEDINGS, 2009, 5552 : 727 - 736
  • [2] Hierarchical Web-Page Clustering via In-Page and Cross-Page Link Structures
    Lin, Cindy Xide
    Yu, Yintao
    Han, Jiawei
    Liu, Bing
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 : 222 - +
  • [3] Web Page Segmentation Evaluation
    Sanoja, Andres
    Gancarski, Stephane
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 753 - 760
  • [4] An Evolutionary Web Clustering for Web Page Predicting
    Wu, Rui
    Zhang, Ling
    JOURNAL OF INTERNET TECHNOLOGY, 2017, 18 (01): : 147 - 155
  • [5] Box clustering segmentation: A new method for vision-based web page preprocessing
    Zeleny, Jan
    Burget, Radek
    Zendulka, Jaroslav
    INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (03) : 735 - 750
  • [6] Web Page Segmentation with Structured Prediction and its Application in Web Page Classification
    Bing, Lidong
    Guo, Rui
    Lam, Wai
    Niu, Zheng-Yu
    Wang, Haifeng
    SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 767 - 776
  • [7] Arabic Web page clustering: A review
    Alghamdi, Hanan M.
    Selamat, Ali
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2019, 31 (01) : 1 - 14
  • [8] Query directed web page clustering
    Crabtree, Daniel
    Andreae, Peter
    Gao, Xiaoying
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 202 - +
  • [9] A novel web page recommender using data automatic clustering and Markov process
    Fereshteh Darbandi Monfared
    SN Applied Sciences, 2019, 1
  • [10] A novel web page recommender using data automatic clustering and Markov process
    Monfared, Fereshteh Darbandi
    SN APPLIED SCIENCES, 2019, 1 (12):