Automatically maintaining navigation sequences for querying semi-structured web sources

被引:5
|
作者
Pan, Alberto [1 ]
Raposo, Juan [1 ]
Alvarez, Manuel [1 ]
Carneiro, Victor [1 ]
Bellas, Fernando [1 ]
机构
[1] Univ A Coruna, Dept Informat & Commun Technol, Fac Informat, La Coruna 15071, Spain
关键词
technologies of DBs/mediators and wrappers; Data mining/Web-based information; Web/Web-based information systems;
D O I
10.1016/j.datak.2007.04.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A substantial subset of Web data has an underlying structure. For instance, the pages obtained in response to a query executed through a Web search form are usually generated by a program that accesses structured data in a local database, and embeds them into an HTML template. For software programs to gain full benefit from these "semi-structured" Web sources, wrapper programs must be built to provide a "machine-readable" view over them. Since Web sources are autonomous, they may experience changes that invalidate the current wrapper, thus automatic maintenance is an important issue. Wrappers must perform two tasks: navigating through Web sites and extracting structured data from HTML pages. While several works have addressed the automatic maintenance of data extraction tasks, the problem of maintaining the navigation sequences remains unaddressed to the best of our knowledge. In this paper, we propose a set of novel techniques to fill this gap. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:795 / 810
页数:16
相关论文
共 50 条
  • [1] Automatically maintaining wrappers for semi-structured web sources
    Raposo, Juan
    Pan, Alberto
    Alvarez, Manuel
    Hidalgo, Justo
    DATA & KNOWLEDGE ENGINEERING, 2007, 61 (02) : 331 - 358
  • [2] WebDB: a system for querying semi-structured data on the Web
    Li, WS
    Shim, J
    Candan, KS
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2002, 13 (01): : 3 - 33
  • [3] Querying semi-structured data
    Abiteboul, S
    DATABASE THEORY - ICDT'97, 1997, 1186 : 1 - 18
  • [4] Querying semi-structured data with graph grammars
    Furfaro, F
    INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, PROCEEDINGS, 2002, : 288 - 293
  • [5] Representation of semi-structured imprecise data for fuzzy querying
    Buche, P
    Haemmerlé, O
    Thomopoulos, R
    JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 2126 - 2131
  • [6] Flexible querying of semi-structured information (Invited talk)
    Pasi, G
    15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 898 - 898
  • [7] A Framework for Extracting Information from Semi-Structured Web Data Sources
    Shaker, Malunoud
    Ibrahim, Hamidah
    Mustapha, Aida
    Abdullah, Lili Nurliyana
    THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 27 - 31
  • [8] A tree-structured query interface for querying semi-structured data
    Newman, S
    Özsoyoglu, ZM
    16TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2004, : 127 - 130
  • [9] Gathering services of IHWA from semi-structured web information sources
    Jeong, JS
    Oh, DI
    IC'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET COMPUTING, VOLS I AND II, 2001, : 375 - 378
  • [10] Automatically maintaining wrappers for web sources
    Raposo, J
    Pan, A
    Alvarez, M
    Hidalgo, J
    9TH INTERNATIONAL DATABASE ENGINEERING & APPLICATION SYMPOSIUM, PROCEEDINGS, 2005, : 105 - 114