Automatically maintaining navigation sequences for querying semi-structured web sources

被引:5
|
作者
Pan, Alberto [1 ]
Raposo, Juan [1 ]
Alvarez, Manuel [1 ]
Carneiro, Victor [1 ]
Bellas, Fernando [1 ]
机构
[1] Univ A Coruna, Dept Informat & Commun Technol, Fac Informat, La Coruna 15071, Spain
关键词
technologies of DBs/mediators and wrappers; Data mining/Web-based information; Web/Web-based information systems;
D O I
10.1016/j.datak.2007.04.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A substantial subset of Web data has an underlying structure. For instance, the pages obtained in response to a query executed through a Web search form are usually generated by a program that accesses structured data in a local database, and embeds them into an HTML template. For software programs to gain full benefit from these "semi-structured" Web sources, wrapper programs must be built to provide a "machine-readable" view over them. Since Web sources are autonomous, they may experience changes that invalidate the current wrapper, thus automatic maintenance is an important issue. Wrappers must perform two tasks: navigating through Web sites and extracting structured data from HTML pages. While several works have addressed the automatic maintenance of data extraction tasks, the problem of maintaining the navigation sequences remains unaddressed to the best of our knowledge. In this paper, we propose a set of novel techniques to fill this gap. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:795 / 810
页数:16
相关论文
共 50 条
  • [31] Towards a unified querying system of both structured and semi-structured imprecise data using fuzzy view
    Buche, P
    Haemmerlé, O
    CONCEPTUAL STRUCTURES: LOGICAL, LINGUISTIC, AND COMPUTATIONAL ISSUES, PROCEEDINGS, 2000, 1867 : 207 - 220
  • [32] Framework for Automatically Construct Ontology Knowledge Base from Semi-structured Datasets
    Baek, Gui-hyun
    Kim, Su-kyoung
    Ahn, Ki-hong
    2015 10TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2015, : 152 - 157
  • [33] WebDP: Understanding Discourse Structures in Semi-Structured Web Documents
    Liu, Peilin
    Lin, Hongyu
    Liao, Meng
    Xiang, Hao
    Han, Xianpei
    Sun, Le
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 10235 - 10258
  • [34] A strategy for data storage and the search for semi-structured data in the Web
    do Nascimento, C. A. S. A.
    Ebecken, N. F. F.
    Rosa, J. L. dos A.
    DATA MINING X: DATA MINING, PROTECTION, DETECTION AND OTHER SECURITY TECHNOLOGIES, 2009, 42 : 51 - +
  • [35] A Logical Foundation for Nested Semi-structured Data and Web Forms
    Ykhlef, Mourad
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2006, 2 (01) : 3 - +
  • [36] Unsupervised Extraction of Product Information from Semi-structured Sources
    Walther, Maximilian
    13TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS (CINTI 2012), 2012, : 257 - 262
  • [37] Bootstrapping Information Extraction from Semi-structured Web Pages
    Carlson, Andrew
    Schafer, Charles
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART I, PROCEEDINGS, 2008, 5211 : 195 - +
  • [38] Data extraction from semi-structured web pages by clustering
    Vuong, Le Phong Bao
    Gao, Xiaoying
    Zhang, Mengjie
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 374 - +
  • [39] A semi-structured information semantic annotation method for Web pages
    Zhang, Lu
    Wang, Tiantian
    Liu, Yiran
    Duan, Qingling
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 6491 - 6501
  • [40] A strategy for extracting information from semi-structured web pages
    Shaker, Mahmoud
    Ibrahim, Hamidah
    Mustapha, Aida
    Abdullah, Lili Nurliyana
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2010, 6 (04) : 304 - 318