Web-Scale Extension of RDF Knowledge Bases from Templated Websites

被引:0
|
作者
Buehmann, Lorenz [1 ]
Usbeck, Ricardo [1 ,2 ]
Ngomo, Axel-Cyrille Ngonga [1 ]
Saleem, Muhammad [1 ]
Both, Andreas [2 ]
Crescenzi, Valter [3 ]
Merialdo, Paolo [3 ]
Qiu, Disheng [3 ]
机构
[1] Univ Leipzig, IFI AKSW, Leipzig, Germany
[2] Unister GmbH, Leipzig, Germany
[3] Univ Roma Tre, Rome, Italy
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data. While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.
引用
收藏
页码:66 / 81
页数:16
相关论文
共 50 条
  • [11] A Dataset for Web-Scale Knowledge Base Population
    Glass, Michael
    Gliozzo, Alfio
    SEMANTIC WEB (ESWC 2018), 2018, 10843 : 256 - 271
  • [12] Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion
    Dong, Xin Luna
    Gabrilovich, Evgeniy
    Heitz, Geremy
    Horn, Wilko
    Lao, Ni
    Murphy, Kevin
    Strohmann, Thomas
    Sun, Shaohua
    Zhang, Wei
    PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 601 - 610
  • [13] On the Semantics of TPF-QS towards Publishing and Querying RDF Streams at Web-scale
    Taelman, Ruben
    Tommasini, Riccardo
    Van Herwegen, Joachim
    Vander Sande, Miel
    Della Valle, Emanuele
    Verborgh, Ruben
    PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC SYSTEMS, 2018, 137 : 43 - 54
  • [14] Leveraging Knowledge Graphs of Movies and their Content for Web-Scale Analysis
    Orlandi, Fabrizio
    Debattista, Jeremy
    Hassan, Islam A.
    Conran, Clare
    Latifi, Majid
    Nicholson, Matthew
    Salim, Fahim A.
    Turner, Daniel
    Conlan, Owen
    O'Sullivan, Declan
    Tang, Jian
    2018 14TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY & INTERNET BASED SYSTEMS (SITIS), 2018, : 609 - 616
  • [15] Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing
    Heck, Larry
    Hakkani-Tur, Dilek
    Tur, Gokhan
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1593 - 1597
  • [16] Extracting large-scale knowledge bases from the web
    Kumar, R
    Raghavan, P
    Rajagopalan, S
    Tomkins, A
    PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1999, : 639 - 650
  • [17] Constructing and Mining Web-Scale Knowledge Graphs WWW 2015 Tutorial
    Bordes, Antoine
    Gabrilovich, Evgeniy
    WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 1523 - 1523
  • [18] MetKB: Enriching RDF Knowledge Bases with Web Entity-Attribute Tables
    Bian, Haoqiong
    Chen, Yueguo
    Du, Xiaoyong
    Zhang, Xiaolu
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013,
  • [19] Web-scale pharmacovigilance: listening to signals from the crowd
    White, Ryen W.
    Tatonetti, Nicholas P.
    Shah, Nigam H.
    Altman, Russ B.
    Horvitz, Eric
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (03) : 404 - 408
  • [20] Querying Web-Scale Knowledge Graphs Through Effective Pruning of Search Space
    Jin, Jiahui
    Luo, Junzhou
    Khemmarat, Samamon
    Gao, Lixin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (08) : 2342 - 2356