A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data Scenario

被引:5
|
作者
Cauteruccio, Francesco [1 ]
Lo Giudice, Paolo [2 ]
Musarella, Lorenzo [2 ]
Terracina, Giorgio [1 ]
Ursino, Domenico [3 ]
Virgili, Luca [3 ]
机构
[1] Univ Calabria, Dipartimento Matemat & Informat, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Mediterranea Reggio Calabria, Dipartimento Ingn Informaz Infrastrutture & Energ, Via Univ,25 Gia Salita Melissari, I-89124 Reggio Di Calabria, CF, Italy
[3] Univ Politecn Marche, Dipartimento Ingn Informaz, Via Brecce Bianche 12, I-60131 Ancona, Italy
关键词
Unstructured sources; interschema property derivation; structuring unstructured data; big data; METADATA QUALITY; DIGITAL REPOSITORIES; SIMILARITY; CLASSIFICATION; CONSTRUCTION; INTEGRATION; SYSTEM; MODEL; DIKE;
D O I
10.1142/S0219622020500182
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The knowledge of interschema properties (e.g., synonymies, homonymies, hyponymies and subschema similarities) plays a key role for allowing decision-making in sources characterized by disparate formats. In the past, wide amount and variety of approaches to derive interschema properties from structured and semi-structured data have been proposed. However, currently, it is esteemed that more than 80% of data sources are unstructured. Furthermore, the number of sources generally involved in an interaction is much higher than in the past. As a consequence, the necessity arises of new approaches to address the interschema property derivation issue in this new scenario. In this paper, we aim at providing a contribution in this setting by proposing an approach capable of uniformly extracting interschema properties from a huge number of structured, semi-structured and unstructured sources.
引用
收藏
页码:849 / 889
页数:41
相关论文
共 50 条
  • [1] Supporting structured, semi-structured and unstructured data in digital libraries
    Sánchez, JA
    Proal, C
    Maldonado-Naude, F
    PROCEEDINGS OF THE FIFTH MEXICAN INTERNATIONAL CONFERENCE IN COMPUTER SCIENCE (ENC 2004), 2004, : 368 - 375
  • [2] Converting unstructured and semi-structured data into knowledge
    Rusu, Octavian
    Halcu, Ionela
    Grigoriu, Oana
    Neculoiu, Giorgian
    Sandulescu, Virginia
    Marinescu, Mariana
    Marinescu, Viorel
    2013 ROEDUNET INTERNATIONAL CONFERENCE (ROEDUNET): NETWORKING IN EDUCATION, 11TH EDITION, 2013,
  • [3] An approach to extracting complex knowledge patterns among concepts belonging to structured, semi-structured and unstructured sources in a data lake
    Lo Giudice, Paolo
    Musarella, Lorenzo
    Sofo, Giuseppe
    Ursino, Domenico
    INFORMATION SCIENCES, 2019, 478 : 606 - 626
  • [4] An automated integration approach for semi-structured and structured data
    Lim, SJ
    Ng, YK
    PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON COOPERATIVE DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2000, : 12 - 21
  • [5] WICCAO: From semi-structured data to structured data
    Li, Z
    Ng, WK
    11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOP ON THE ENGINEERING OF COMPUTER-BASED SYSTEMS, PROCEEDINGS, 2004, : 86 - 93
  • [6] Integrating of structured, semi-structured and unstructured data in natural and build environmental engineering
    Barbulescu, Mihai
    Grigoriu, Ramona-Oana
    Halcu, Ionela
    Neculoiu, Giorgian
    Sandulescu, Virginia Cristiana
    Marinescu, Mariana
    Marinescu, Viorel
    2013 ROEDUNET INTERNATIONAL CONFERENCE (ROEDUNET): NETWORKING IN EDUCATION, 11TH EDITION, 2013,
  • [7] A Proposed Technique for Conversion of Unstructured Agro-data to Semi-structured or Structured data
    Sambrekar, Kuldeep
    Rajpurohit, Vijay. S.
    Joshi, Jui
    2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [8] Data Integration Approach for Semi-structured and Structured Data (Linked Data)
    Kettouch, Mohamed Salah
    Luca, Cristina
    Hobbs, Mike
    Fatima, Arooj
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2015, : 820 - 825
  • [9] Integrating unnormalised semi-structured data sources
    Kittivoravitkul, S
    McBrien, P
    ADVANCED INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2005, 3520 : 460 - 474
  • [10] Ontology population from unstructured and semi-structured texts
    Yoon, Hee-Geun
    Han, Yong Jin
    Park, Seong-Bae
    Park, Se-Young
    ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 135 - +