Deep web data extraction based on visual information processing

被引:3
|
作者
Liu J. [1 ]
Lin L. [1 ]
Cai Z. [1 ]
Wang J. [2 ,3 ]
Kim H.-J. [4 ]
机构
[1] College of Information Engineering, Shanghai Maritime University, Shanghai
[2] Key Laboratory of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education, Nanjing
[3] College of Information Engineering, Yangzhou University, Yangzhou
[4] Business Administration Research Institute, Sungshin W. University, Seoul
关键词
CNN; Data extraction; Deep web; Visual information;
D O I
10.1007/s12652-017-0587-0
中图分类号
学科分类号
摘要
With the rapid development of technology, the Web has become the largest encyclopedic database. Although users can get information conveniently on the surface web by using applications such as browsers, it is hard to retrieve information in the deep web. Deep web requires a user submit a query to the server to get information from its database to generate the result webpage. Thus methods different from traditional Web surfing are needed to conduct the data extraction in deep web. Most of the existing deep web data extraction methods are based on DOM tree analysis. In this paper, to fully utilize the visual information contained in a webpage, a data region locating method based on convolutional neural network and a visual information based segmentation algorithm are proposed. In order to verify the efficiency of the proposed method, we apply it to real world commercial websites to perform data extraction. Experiments of data region location model, data extraction, and data item alignment verify that our proposed method can effectively improve the accuracy of data region location and the efficiency of data extraction. © Springer-Verlag GmbH Germany 2017.
引用
收藏
页码:1481 / 1491
页数:10
相关论文
共 50 条
  • [1] A Visual Based Page Segmentation for Deep Web Data Extraction
    Palekar, Vikas R.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2011), VOL 2, 2012, 131 : 791 - 804
  • [2] Web Data Extraction Based On Visual Information and Partial Tree Alignment
    Fan, Siwu
    Wang, Xinjun
    Dong, Yongquan
    2014 11TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2014, : 18 - 23
  • [3] Post-processing of Deep Web Information Extraction Based on Domain Ontology
    Liu, Lu
    Peng, Tao
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2013, 13 (04) : 25 - 32
  • [4] Deep Web Data Extraction
    Hong, Jer Lang
    IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010, : 3420 - 3427
  • [5] Deep Web navigation in Web data extraction
    Baumgartner, Robert
    Ceresna, Michal
    Ledermueller, Gerald
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 2, PROCEEDINGS, 2006, : 698 - +
  • [6] Robust Web Data Extraction Based on Unsupervised Visual Validation
    Potvin, Benoit
    Villemaire, Roger
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT I, 2019, 11431 : 77 - 89
  • [7] Visual extraction of information from web pages
    Della Penna, Giuseppe
    Magazzeni, Daniele
    Orefice, Sergio
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2010, 21 (01): : 23 - 32
  • [8] A Research of the Internet Based on Web Information Extraction and Data Fusion
    Jiang, Yajun
    Wu, Zaoliang
    Zhan, Zengrong
    Xu, Lingyu
    NEW HORIZONS IN WEB-BASED LEARNING: ICWL 2010 WORKSHOPS, 2011, 6537 : 195 - 206
  • [9] Review of Deep Web Data Extraction
    Li, Shenglin
    Chen, Chen
    Luo, Kaiwen
    Song, Bo
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 1068 - 1070
  • [10] Information Retrieval from Deep Web Based on Visual Query Interpretation
    Boughammoura, Radhouane
    Omri, Mohamed Nazih
    Hlaoua, Lobna
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2012, 2 (04) : 45 - 59