A tool for data cube construction from structurally heterogeneous XML documents

被引:8
|
作者
Nappila, Turkka [1 ]
Jarvelin, Kalervo [2 ]
Niemi, Timo [1 ]
机构
[1] Univ Tampere, Dept Comp Sci, FIN-33014 Tampere, Finland
[2] Univ Tampere, Dept Informat Studies, FIN-33014 Tampere, Finland
关键词
D O I
10.1002/asi.20756
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data cubes for OLAP (On-Line Analytical Processing) often need to be constructed from data located in several distributed and autonomous information sources. Such a data integration process is challenging due to semantic, syntactic, and structural heterogeneity among the data. While XML (extensible markup language) is the de facto standard for data exchange, the three types of heterogeneity remain. Moreover, popular path-oriented XML query languages, such as XQuery, require the user to know in much detail the structure of the documents to be processed and are, thus, effectively impractical in many real-world data integration tasks. Several Lowest Common Ancestor (LCA)-based XML query evaluation strategies have recently been introduced to provide a more structure-independent way to access XML documents. We shall, however, show that this approach leads in the context of certain-not uncommon-types of XML documents to undesirable results. This article introduces a novel high-level data extraction primitive that utilizes the purpose-built Smallest Possible Context (SPC) query evaluation strategy. We demonstrate, through a system prototype for OLAP data cube construction and a sample application in informetrics, that our approach has real advantages in data integration.
引用
收藏
页码:435 / 449
页数:15
相关论文
共 50 条
  • [31] Extracting relations from XML documents
    Agichtein, E
    Ho, CTH
    Josifovski, V
    Gerhardt, J
    CONCEPTUAL MODELING FOR NOVEL APPLICATION DOMAINS, PROCEEDINGS, 2003, 2814 : 390 - 401
  • [32] XML subtree reconstruction from relational storage of XML documents
    Chebotko, Artem
    Atay, Mustafa
    Lu, Shiyong
    Fotouhi, Farshad
    DATA & KNOWLEDGE ENGINEERING, 2007, 62 (02) : 199 - 218
  • [33] Doc2Cube: Allocating Documents to Text Cube without Labeled Data
    Tao, Fangbo
    Zhang, Chao
    Chen, Xiusi
    Jiang, Meng
    Hanratty, Tim
    Kaplan, Lance
    Han, Jiawei
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1260 - 1265
  • [34] An intelligent XML-based multidimensional data cube exchange
    Seng, Jia-Lang
    Wong, Zon
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) : 7371 - 7390
  • [35] XEdge: Clustering Homogeneous and Heterogeneous XML Documents Using Edge Summaries
    Antonellis, Panagiotis
    Makris, Christos
    Tsirakis, Nikos
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 1081 - 1088
  • [36] Grove Data Model for Efficient Representation of XML Documents
    Anwar, Yasmin
    Kamel, Amr
    Ahmed, Aziza Saad
    WOCN: 2009 IFIP INTERNATIONAL CONFERENCE ON WIRELESS AND OPTICAL COMMUNICATIONS NETWORKS, 2009, : 99 - +
  • [37] Complexity of data tree patterns over XML documents
    David, Claire
    MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2008, PROCEEDINGS, 2008, 5162 : 278 - 289
  • [38] An aggressive aggregation of XML documents for summary data generation
    Yoon, JP
    Kerschberg, L
    WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVII, PROCEEDINGS: CYBERNETICS AND INFORMATICS: CONCEPTS AND APPLICATIONS (PT II), 2001, : 200 - 205
  • [39] Multidimensional modeling of data-centric XML documents
    Hachaichi, Yasser
    Feki, Jamel
    Ben-Abdallah, Hanene
    JOURNAL OF DECISION SYSTEMS, 2010, 19 (03) : 313 - 345
  • [40] Interactive Visualization of Data-Oriented XML Documents
    Chmelar, Petr
    Hernych, Radim
    Kubicek, Daniel
    ADVANCES IN COMPUTER AND INFORMATIOM SCIENCES AND ENGINEERING, 2008, : 390 - 393