A tool for data cube construction from structurally heterogeneous XML documents

被引:8
|
作者
Nappila, Turkka [1 ]
Jarvelin, Kalervo [2 ]
Niemi, Timo [1 ]
机构
[1] Univ Tampere, Dept Comp Sci, FIN-33014 Tampere, Finland
[2] Univ Tampere, Dept Informat Studies, FIN-33014 Tampere, Finland
关键词
D O I
10.1002/asi.20756
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data cubes for OLAP (On-Line Analytical Processing) often need to be constructed from data located in several distributed and autonomous information sources. Such a data integration process is challenging due to semantic, syntactic, and structural heterogeneity among the data. While XML (extensible markup language) is the de facto standard for data exchange, the three types of heterogeneity remain. Moreover, popular path-oriented XML query languages, such as XQuery, require the user to know in much detail the structure of the documents to be processed and are, thus, effectively impractical in many real-world data integration tasks. Several Lowest Common Ancestor (LCA)-based XML query evaluation strategies have recently been introduced to provide a more structure-independent way to access XML documents. We shall, however, show that this approach leads in the context of certain-not uncommon-types of XML documents to undesirable results. This article introduces a novel high-level data extraction primitive that utilizes the purpose-built Smallest Possible Context (SPC) query evaluation strategy. We demonstrate, through a system prototype for OLAP data cube construction and a sample application in informetrics, that our approach has real advantages in data integration.
引用
收藏
页码:435 / 449
页数:15
相关论文
共 50 条
  • [41] A Data Model for Versioned XML Documents using XQuery
    Arevalo Rosado, Luis
    Polo Marquez, Antonio
    Salas Sanchez, Miryam
    2008 THIRD INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, VOLS 1 AND 2, 2008, : 957 - 959
  • [42] Querying and ranking xml documents based on data synopses
    He, Weimin
    Lv, Teng
    Journal of Digital Information Management, 2011, 9 (05): : 199 - 205
  • [43] XML and legacy data conversion: Introducing ''consumable documents''
    VanVooren, L
    SGML EUROPE '97 - CONFERENCE PROCEEDINGS, 1997, : 185 - 187
  • [44] Retrieving XML Data from Heterogeneous Sources through Vague Querying
    Fazzinga, Bettina
    Flesca, Sergio
    Pugliese, Andrea
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2009, 9 (02)
  • [45] A New Model for Discovering XML Association Rules from XML Documents
    AliMohammadzadeh, R.
    Rahgozar, M.
    Zarnani, A.
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 14, 2006, 14 : 365 - +
  • [46] Construction of decision trees using data cube
    Fu, Lixin
    ENTERPRISE INFORMATION SYSTEMS VII, 2006, : 87 - 94
  • [47] Heterogeneous Data Integration Using of XML and PHP
    Geng, Yushui
    Kong, Xiangcui
    Guo, Aizhang
    PROCEEDINGS OF 2008 INTERNATIONAL PRE-OLYMPIC CONGRESS ON COMPUTER SCIENCE, VOL I: COMPUTER SCIENCE AND ENGINEERING, 2008, : 116 - 119
  • [48] Research on Heterogeneous Data Exchange based on XML
    Li, Huanqin
    Liu, Jinfeng
    2010 INTERNATIONAL CONFERENCE ON COMMUNICATION AND VEHICULAR TECHNOLOGY (ICCVT 2010), VOL I, 2010, : 148 - 151
  • [49] Semantic integration of XML heterogeneous data sources
    Reynaud, C
    Sirot, JP
    Vodislav, D
    2001 INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2001, : 199 - 208
  • [50] Semantic integration of heterogeneous XML data sources
    Kim, HH
    Park, SS
    OBJECT-ORIENTED INFORMATION SYSTEMS, PROCEEDINGS, 2002, 2425 : 95 - 107