Web-Scale Extension of RDF Knowledge Bases from Templated Websites

被引:0
|
作者
Buehmann, Lorenz [1 ]
Usbeck, Ricardo [1 ,2 ]
Ngomo, Axel-Cyrille Ngonga [1 ]
Saleem, Muhammad [1 ]
Both, Andreas [2 ]
Crescenzi, Valter [3 ]
Merialdo, Paolo [3 ]
Qiu, Disheng [3 ]
机构
[1] Univ Leipzig, IFI AKSW, Leipzig, Germany
[2] Unister GmbH, Leipzig, Germany
[3] Univ Roma Tre, Rome, Italy
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data. While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.
引用
收藏
页码:66 / 81
页数:16
相关论文
共 50 条
  • [31] Evento 360: Social Event Discovery from Web-scale Multimedia Collection
    Choi, Jaeyoung
    Kim, Eungchan
    Larson, Martha
    Friedland, Gerald
    Hanjalic, Alan
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 193 - 196
  • [32] An Analysis of Web-scale Discovery Services From the Perspective of User's Relevance Judgment
    Lee, Boram
    Chung, EunKyung
    JOURNAL OF ACADEMIC LIBRARIANSHIP, 2016, 42 (05): : 529 - 534
  • [33] Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
    Iscen, Ahmet
    Fathi, Alireza
    Schmid, Cordelia
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19295 - 19304
  • [34] Learning to construct knowledge bases from the World Wide Web
    Craven, M
    DiPasquo, D
    Freitag, D
    McCallum, A
    Mitchell, T
    Nigam, K
    Slattery, S
    ARTIFICIAL INTELLIGENCE, 2000, 118 (1-2) : 69 - 113
  • [35] Knowledge extraction using semantic similarity of concepts from Web of Things knowledge bases
    Muppavarapu, Vamsee
    Ramesh, Gowtham
    Gyrard, Amelie
    Noura, Mahda
    DATA & KNOWLEDGE ENGINEERING, 2021, 135
  • [36] Mining Rules with Constants from Large Scale Knowledge Bases
    Wang, Xuan
    Zhang, Jingjing
    Chen, Jinchuan
    Fan, Ju
    CONCEPTUAL MODELING, ER 2018, 2018, 11157 : 521 - 535
  • [37] Entity Extraction with Knowledge from Web Scale Corpora
    Wen, Zeyi
    Huang, Zeyu
    Zhang, Rui
    DATABASES THEORY AND APPLICATIONS, ADC 2020, 2020, 12008 : 173 - 185
  • [39] A Web Browser Extension for growing-up Ontological Knowledge from Traditional Web Content
    Pazienza, Maria Teresa
    Pennacchiotti, Marco
    Stellato, Armando
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2229 - 2235
  • [40] Knowledge bases built on web languages from the point of view of predicate logics
    Vajgl, Marek
    Lukasova, Alena
    Zacek, Martin
    APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2017, 1836