Integrating HTML']HTML tables using semantic hierarchies and meta-data sets

被引:2
|
作者
Lim, SJ [1 ]
Ng, YK [1 ]
Yang, XC [1 ]
机构
[1] Brigham Young Univ, Dept Comp Sci, Provo, UT 84602 USA
关键词
D O I
10.1109/IDEAS.2002.1029668
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As the Internet is a global network, there is a demand on accessing closely related data without browsing through different Web documents. A significant amount of these data are presented in HTML documents. Since data contents of HTML documents are intervened by markups, it is not trivial to integrate and provide a unified view of closely related data in different HTML documents. In this paper we present an approach for integrating semantically related data in any HTML tables that belong to a particular domain of interest (ID), such as house/apartment rental, by using the semantic hierarchies generated from the tables and the predefined meta-data sets that indicate related column names in ID. In our approach, we capture each data source as semi-structured data, called semantic hierarchy, and the end result of integrating different HTML tables of ID is a unified view of data in the tables, which is presented in an XML document. Besides HTML tables, our approach can be adopted by any system that integrates semi-structured data across different platforms.
引用
收藏
页码:160 / 169
页数:10
相关论文
共 50 条
  • [1] Capturing semantic hierarchies to perform meaningful integration in HTML']HTML tables
    Li, SJ
    Liu, MC
    Wang, GR
    Peng, ZY
    ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 899 - 902
  • [2] Analysis and Interpretation of Semantic HTML']HTML Tables
    Yin, Wensheng
    Guo, Feifei
    Xu, Fan
    Chen, Xiuguo
    WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, 5854 : 71 - 79
  • [3] On extracting data from tables that are encoded using HTML']HTML
    Roldan, Juan C.
    Jimenez, Patricia
    Corchuelo, Rafael
    KNOWLEDGE-BASED SYSTEMS, 2020, 190
  • [4] Extracting Linked Data from HTML']HTML Tables
    Ktob, Ahmed
    Li, Zhoujun
    Bouchiha, Djelloul
    2017 IEEE 3RD INTERNATIONAL CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (CIC), 2017, : 48 - 53
  • [5] A clustering approach to extract data from HTML']HTML tables
    Jimenez, Patricia
    Roldan, Juan C.
    Corchuelo, Rafael
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (06)
  • [6] An automated change-detection algorithm for HTML']HTML documents based on semantic hierarchies
    Lim, SJ
    Ng, YK
    17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 303 - 312
  • [7] Automating the extraction of data from HTML']HTML tables with unknown structure
    Embley, DW
    Tao, C
    Liddle, SW
    DATA & KNOWLEDGE ENGINEERING, 2005, 54 (01) : 3 - 28
  • [8] An XML approach to semantically extract data from HTML']HTML tables
    Liu, JX
    Ao, ZY
    Park, HH
    Chen, YF
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2005, 3588 : 696 - 705
  • [9] An automated approach for retrieving hierarchical data from HTML']HTML tables
    Lim, SJ
    Ng, YK
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99, 1999, : 466 - 474
  • [10] A hybrid quantum approach to leveraging data from HTML']HTML tables
    Jimenez, Patricia
    Roldan, Juan C.
    Corchuelo, Rafael
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (02) : 441 - 474