A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data Scenario

被引:5
|
作者
Cauteruccio, Francesco [1 ]
Lo Giudice, Paolo [2 ]
Musarella, Lorenzo [2 ]
Terracina, Giorgio [1 ]
Ursino, Domenico [3 ]
Virgili, Luca [3 ]
机构
[1] Univ Calabria, Dipartimento Matemat & Informat, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Mediterranea Reggio Calabria, Dipartimento Ingn Informaz Infrastrutture & Energ, Via Univ,25 Gia Salita Melissari, I-89124 Reggio Di Calabria, CF, Italy
[3] Univ Politecn Marche, Dipartimento Ingn Informaz, Via Brecce Bianche 12, I-60131 Ancona, Italy
关键词
Unstructured sources; interschema property derivation; structuring unstructured data; big data; METADATA QUALITY; DIGITAL REPOSITORIES; SIMILARITY; CLASSIFICATION; CONSTRUCTION; INTEGRATION; SYSTEM; MODEL; DIKE;
D O I
10.1142/S0219622020500182
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The knowledge of interschema properties (e.g., synonymies, homonymies, hyponymies and subschema similarities) plays a key role for allowing decision-making in sources characterized by disparate formats. In the past, wide amount and variety of approaches to derive interschema properties from structured and semi-structured data have been proposed. However, currently, it is esteemed that more than 80% of data sources are unstructured. Furthermore, the number of sources generally involved in an interaction is much higher than in the past. As a consequence, the necessity arises of new approaches to address the interschema property derivation issue in this new scenario. In this paper, we aim at providing a contribution in this setting by proposing an approach capable of uniformly extracting interschema properties from a huge number of structured, semi-structured and unstructured sources.
引用
收藏
页码:849 / 889
页数:41
相关论文
共 50 条
  • [31] Cyclical structure Converter(CSC): a system for handling the interaction of structured and semi-structured data sources
    Mbale, J
    Ursino, D
    Fei, XX
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2003, 9 (05) : 423 - 446
  • [32] A method of semi-automated ontology population from multiple semi-structured data sources
    Leshcheva, Irina
    Begler, Alena
    JOURNAL OF INFORMATION SCIENCE, 2022, 48 (02) : 223 - 236
  • [33] Unsupervised Extraction of Product Information from Semi-structured Sources
    Walther, Maximilian
    13TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS (CINTI 2012), 2012, : 257 - 262
  • [34] Building knowledge base from semi-structured data
    Liu, Xiao-Li
    Wu, Guo-Qing
    Yang, Min
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 839 - +
  • [35] Interactive tuples extraction from semi-structured data
    Gilleron, Remi
    Marty, Patrick
    Tommasi, Marc
    Torre, Fabien
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 997 - 1004
  • [36] Interactive Data Extraction from Semi-Structured Text
    Broman, Per
    Thalheim, Bernhard
    INFORMATION MODELLING AND KNOWLEDGE BASES XXIII, 2012, 237 : 1 - 19
  • [37] A structure-based approach to querying semi-structured data
    Fernandez, M
    Popa, L
    Suciu, D
    DATABASE PROGRAMMING LANGUAGES, 1998, 1369 : 136 - 159
  • [38] OntoExtractor: A fuzzy-based approach in clustering semi-structured data sources and metadata generation
    Cui, Z
    Damiani, E
    Leida, M
    Viviani, M
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2005, 3681 : 112 - 118
  • [39] Research on Semi-Structured and Unstructured Data Storage and Management Model for Multi-Tenant
    Hu, Xin
    Xu, Yabin
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2019, 12 (01) : 49 - 62
  • [40] Gathering services of IHWA from semi-structured web information sources
    Jeong, JS
    Oh, DI
    IC'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET COMPUTING, VOLS I AND II, 2001, : 375 - 378