A tool for data cube construction from structurally heterogeneous XML documents

被引:8
|
作者
Nappila, Turkka [1 ]
Jarvelin, Kalervo [2 ]
Niemi, Timo [1 ]
机构
[1] Univ Tampere, Dept Comp Sci, FIN-33014 Tampere, Finland
[2] Univ Tampere, Dept Informat Studies, FIN-33014 Tampere, Finland
关键词
D O I
10.1002/asi.20756
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data cubes for OLAP (On-Line Analytical Processing) often need to be constructed from data located in several distributed and autonomous information sources. Such a data integration process is challenging due to semantic, syntactic, and structural heterogeneity among the data. While XML (extensible markup language) is the de facto standard for data exchange, the three types of heterogeneity remain. Moreover, popular path-oriented XML query languages, such as XQuery, require the user to know in much detail the structure of the documents to be processed and are, thus, effectively impractical in many real-world data integration tasks. Several Lowest Common Ancestor (LCA)-based XML query evaluation strategies have recently been introduced to provide a more structure-independent way to access XML documents. We shall, however, show that this approach leads in the context of certain-not uncommon-types of XML documents to undesirable results. This article introduces a novel high-level data extraction primitive that utilizes the purpose-built Smallest Possible Context (SPC) query evaluation strategy. We demonstrate, through a system prototype for OLAP data cube construction and a sample application in informetrics, that our approach has real advantages in data integration.
引用
收藏
页码:435 / 449
页数:15
相关论文
共 50 条
  • [1] A metadata tool for retrieval from heterogeneous distributed XML documents
    Nam, YK
    Goguen, J
    Wang, GL
    COMPUTATIONAL SCIENCE - ICCS 2003, PT IV, PROCEEDINGS, 2003, 2660 : 1020 - 1029
  • [2] Study of the automatic construction of XML documents models from a relational data model
    Laforest, F
    Boumédiene, M
    14TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, : 566 - 570
  • [3] A practical approach to extracting DTD-conforming XML documents from heterogeneous data sources
    Chen, SK
    Lo, ML
    Wu, KL
    Yih, JS
    Viehrig, C
    INFORMATION SCIENCES, 2006, 176 (07) : 820 - 844
  • [4] Cumulative path summary for structurally dynamic XML documents
    Gururaj, R.
    Kumar, P. Sreenivasa
    ICIT 2006: 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2006, : 229 - 230
  • [5] The Implementation of a Normalization Tool for XML Documents
    Kao, Kuo-Fong
    Tsai, Mark
    Liao, I-En
    JOURNAL OF INTERNET TECHNOLOGY, 2008, 9 (02): : 131 - 137
  • [6] Lattice Cube semantic index based mining on XML documents
    Natarajan, A. M.
    Premalatha, K.
    Kogilavani, A.
    INNOVATIONS AND ADVANCED TECHNIQUES IN COMPUTER AND INFORMATION SCIENCES AND ENGINEERING, 2007, : 261 - 266
  • [7] A Tool for Spatial Reasoning in XML Documents
    Papadakis, Nikos
    Kartakis, Sokratis
    Papadakis, Kostas
    Papadaki, Eva
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2015, 9 (01) : 67 - 103
  • [8] XML-based data cube
    Wang, XL
    Dong, YS
    2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : E48 - E53
  • [9] Creating XML documents from relational data sources
    Vittori, CM
    Dorneles, CF
    Heuser, CA
    ELECTRONIC COMMERCE AND WEB TECHNOLOGIES, 2001, 2115 : 60 - 70
  • [10] DTD-Miner: A tool for mining DTD from XML documents
    Moh, CH
    Lim, EP
    Ng, WK
    WECWIS 2000: SECOND INTERNATIONAL WORKSHOP ON ADVANCED ISSUES OF E-COMMERCE AND WEB-BASED INFORMATION SYSTEMS, PROCEEDING, 2000, : 144 - 151