Automatically maintaining navigation sequences for querying semi-structured web sources

被引:5
|
作者
Pan, Alberto [1 ]
Raposo, Juan [1 ]
Alvarez, Manuel [1 ]
Carneiro, Victor [1 ]
Bellas, Fernando [1 ]
机构
[1] Univ A Coruna, Dept Informat & Commun Technol, Fac Informat, La Coruna 15071, Spain
关键词
technologies of DBs/mediators and wrappers; Data mining/Web-based information; Web/Web-based information systems;
D O I
10.1016/j.datak.2007.04.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A substantial subset of Web data has an underlying structure. For instance, the pages obtained in response to a query executed through a Web search form are usually generated by a program that accesses structured data in a local database, and embeds them into an HTML template. For software programs to gain full benefit from these "semi-structured" Web sources, wrapper programs must be built to provide a "machine-readable" view over them. Since Web sources are autonomous, they may experience changes that invalidate the current wrapper, thus automatic maintenance is an important issue. Wrappers must perform two tasks: navigating through Web sites and extracting structured data from HTML pages. While several works have addressed the automatic maintenance of data extraction tasks, the problem of maintaining the navigation sequences remains unaddressed to the best of our knowledge. In this paper, we propose a set of novel techniques to fill this gap. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:795 / 810
页数:16
相关论文
共 50 条
  • [41] A Semantic Layer on Semi-structured Data Sources for Intuitive Chatbots
    Augello, Agnese
    Vassallo, Giorgio
    Gaglio, Salvatore
    Pilato, Giovanni
    CISIS: 2009 INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS, VOLS 1 AND 2, 2009, : 760 - +
  • [42] EGA: An algorithm for automatic semi-structured Web documents extraction
    Li, LY
    Tang, SW
    Yang, DQ
    Wang, TJ
    Su, ZH
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2004, 2973 : 787 - 798
  • [43] Searching the Web as an elemental semi-structured information system of today
    Fülep, D
    DISTRIBUTED AND PARALLEL SYSTEMS : FROM INSTRUCTION PARALLELISM TO CLUSTER COMPUTING, 2000, 567 : 215 - 223
  • [45] A semi-structured information semantic annotation method for Web pages
    Lu Zhang
    Tiantian Wang
    Yiran Liu
    Qingling Duan
    Neural Computing and Applications, 2020, 32 : 6491 - 6501
  • [46] Topological Navigation for Autonomous Underwater Vehicles in Confined Semi-Structured Environments
    Rossi, Claudio
    Zapata, Adrian Caro
    Milosevic, Zorana
    Suarez, Ramon
    Dominguez, Sergio
    SENSORS, 2023, 23 (05)
  • [47] Faster navigation of semi-structured forest environments using multirotor UAVs
    Lin, Tzu-Jui
    Stol, Karl A.
    ROBOTICA, 2023, 41 (02) : 735 - 755
  • [48] Automated Extraction of Concept Matcher Thesaurus from Semi-Structured Catalogue-Like Sources of Data on the Web
    Lapaev, Maxim
    2016 18TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION AND SEMINAR ON INFORMATION SECURITY AND PROTECTION OF INFORMATION TECHNOLOGY (FRUCT-ISPIT), 2016, : 153 - 160
  • [49] Intelligent integration of information from semi-structured web data sources on the basis of ontology and meta-models
    Arnicans, Guntis
    Karnitis, Girts
    2006 SEVENTH INTERNATIONAL BALTIC CONFERENCE ON DATABASES AND INFORMATION SYSTEMS - PROCEEDINGS, 2006, : 177 - +
  • [50] Integrating semi-structured data into business applications:: A web intelligence example
    Baumgartner, R
    Frölich, O
    Gottlob, G
    Herzog, M
    Lehmann, P
    PROFESSIONAL KNOWLEDGE MANAGEMENT, 2005, 3782 : 469 - 482