Automatically maintaining navigation sequences for querying semi-structured web sources

被引:5
|
作者
Pan, Alberto [1 ]
Raposo, Juan [1 ]
Alvarez, Manuel [1 ]
Carneiro, Victor [1 ]
Bellas, Fernando [1 ]
机构
[1] Univ A Coruna, Dept Informat & Commun Technol, Fac Informat, La Coruna 15071, Spain
关键词
technologies of DBs/mediators and wrappers; Data mining/Web-based information; Web/Web-based information systems;
D O I
10.1016/j.datak.2007.04.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A substantial subset of Web data has an underlying structure. For instance, the pages obtained in response to a query executed through a Web search form are usually generated by a program that accesses structured data in a local database, and embeds them into an HTML template. For software programs to gain full benefit from these "semi-structured" Web sources, wrapper programs must be built to provide a "machine-readable" view over them. Since Web sources are autonomous, they may experience changes that invalidate the current wrapper, thus automatic maintenance is an important issue. Wrappers must perform two tasks: navigating through Web sites and extracting structured data from HTML pages. While several works have addressed the automatic maintenance of data extraction tasks, the problem of maintaining the navigation sequences remains unaddressed to the best of our knowledge. In this paper, we propose a set of novel techniques to fill this gap. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:795 / 810
页数:16
相关论文
共 50 条
  • [21] Robot visual navigation in semi-structured outdoor environments
    Mateus, D
    Aviña, G
    Devy, M
    2005 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-4, 2005, : 4691 - 4696
  • [22] A Hybrid Strategy for Robot Navigation in Semi-structured Environments
    de Oliveira, Guilherme C. R.
    de Carvalho, Kevin B.
    Brandao, Alexandre S.
    2018 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2018, : 23 - 28
  • [23] Efficient robot navigation for semi-structured indoor storehouse
    Sun, Kai
    Yu, Yuanlong
    Gu, Jason
    2015 IEEE 28TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2015, : 1313 - 1317
  • [24] New method for maintaining semi-structured data described in XML
    Kasukawa, Takeya
    Matsuda, Hideo
    Nakanishi, Michio
    Hashimoto, Akihiro
    IEEE Pacific RIM Conference on Communications, Computers, and Signal Processing - Proceedings, 1999, : 258 - 261
  • [25] Towards category-based fuzzy querying of both structured and semi-structured imprecise data
    Buche, P
    Haemmerlé, O
    FLEXIBLE QUERY ANSWERING SYSTEMS: RECENT ADVANCES, 2001, : 362 - 375
  • [26] Ceres: Harvesting Knowledge from the Semi-structured Web
    Dong, Xin Luna
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 1 - 1
  • [27] Extracting information from semi-structured Internet sources
    Jeong, JS
    Oh, DI
    ISIE 2001: IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS PROCEEDINGS, VOLS I-III, 2001, : 1378 - 1381
  • [28] An Automated Wrapper Generator for Semi-Structured Web Contents
    Chen, Lung-Pin
    Hsu, Wen-Nan
    INTELLIGENT SYSTEMS AND APPLICATIONS (ICS 2014), 2015, 274 : 572 - 582
  • [29] Information extraction from semi-structured web documents
    Yun, Bo-Hyun
    Seo, Chang-Ho
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2006, 4092 : 586 - 598
  • [30] Extracting information from semi-structured internet sources
    Div. of Info. Tech. Eng., College of Engineering, SoonChunHyang University, Asan, Korea, Republic of
    IEEE Int Symp Ind Electron, (1378-1381):