A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data Scenario

被引:5
|
作者
Cauteruccio, Francesco [1 ]
Lo Giudice, Paolo [2 ]
Musarella, Lorenzo [2 ]
Terracina, Giorgio [1 ]
Ursino, Domenico [3 ]
Virgili, Luca [3 ]
机构
[1] Univ Calabria, Dipartimento Matemat & Informat, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Mediterranea Reggio Calabria, Dipartimento Ingn Informaz Infrastrutture & Energ, Via Univ,25 Gia Salita Melissari, I-89124 Reggio Di Calabria, CF, Italy
[3] Univ Politecn Marche, Dipartimento Ingn Informaz, Via Brecce Bianche 12, I-60131 Ancona, Italy
关键词
Unstructured sources; interschema property derivation; structuring unstructured data; big data; METADATA QUALITY; DIGITAL REPOSITORIES; SIMILARITY; CLASSIFICATION; CONSTRUCTION; INTEGRATION; SYSTEM; MODEL; DIKE;
D O I
10.1142/S0219622020500182
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The knowledge of interschema properties (e.g., synonymies, homonymies, hyponymies and subschema similarities) plays a key role for allowing decision-making in sources characterized by disparate formats. In the past, wide amount and variety of approaches to derive interschema properties from structured and semi-structured data have been proposed. However, currently, it is esteemed that more than 80% of data sources are unstructured. Furthermore, the number of sources generally involved in an interaction is much higher than in the past. As a consequence, the necessity arises of new approaches to address the interschema property derivation issue in this new scenario. In this paper, we aim at providing a contribution in this setting by proposing an approach capable of uniformly extracting interschema properties from a huge number of structured, semi-structured and unstructured sources.
引用
收藏
页码:849 / 889
页数:41
相关论文
共 50 条
  • [41] Information discovery from semi-structured sources - Application to astronomical literature
    Dkaki, T
    Dousset, B
    Egret, D
    Mothe, J
    COMPUTER PHYSICS COMMUNICATIONS, 2000, 127 (2-3) : 198 - 206
  • [42] Universal data capture technology from semi-structured forms
    Tuganbaev, D
    Pakhchanian, A
    Deryagin, D
    EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 458 - 462
  • [43] Efficient substructure discovery from large semi-structured data
    Asai, T
    Abe, K
    Kawasoe, S
    Sakamoto, H
    Arimura, H
    Arikawa, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (12): : 2754 - 2763
  • [44] Data extraction from semi-structured web pages by clustering
    Vuong, Le Phong Bao
    Gao, Xiaoying
    Zhang, Mengjie
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 374 - +
  • [45] Efficient substructure discovery from large semi-structured data
    Asai, T
    Abe, K
    Kawaoe, S
    Arimura, H
    Sakamoto, H
    Arikawa, S
    PROCEEDINGS OF THE SECOND SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2002, : 158 - 174
  • [46] Discovering frequent substructures from hierarchical semi-structured data
    Cong, G
    Yi, L
    Liu, B
    Wang, K
    PROCEEDINGS OF THE SECOND SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2002, : 175 - +
  • [47] Knowledge discovery from semi-structured data for conceptual organization
    Gupta, S.
    Goyal, R.
    Shubham, K.
    Dey, L.
    Malik, A.
    Chaudhury, S.
    Bhattacharya, S.
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS PROCEEDINGS, 2006, : 291 - +
  • [48] Extraction and transformation of data from semi-structured text files using a declarative approach
    Raminhos, R.
    Moura-Pires, J.
    ICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: DATABASES AND INFORMATION SYSTEMS INTEGRATION, 2007, : 199 - +
  • [49] An Overview on XML Semantic Disambiguation from Unstructured Text to Semi-Structured Data: Background, Applications, and Ongoing Challenges
    Tekli, Joe
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (06) : 1383 - 1407
  • [50] Comparing avian species richness estimates from structured and semi-structured citizen science data
    Fang-Yu Shen
    Tzung-Su Ding
    Jo-Szu Tsai
    Scientific Reports, 13