Web-Scale Extension of RDF Knowledge Bases from Templated Websites

被引:0
|
作者
Buehmann, Lorenz [1 ]
Usbeck, Ricardo [1 ,2 ]
Ngomo, Axel-Cyrille Ngonga [1 ]
Saleem, Muhammad [1 ]
Both, Andreas [2 ]
Crescenzi, Valter [3 ]
Merialdo, Paolo [3 ]
Qiu, Disheng [3 ]
机构
[1] Univ Leipzig, IFI AKSW, Leipzig, Germany
[2] Unister GmbH, Leipzig, Germany
[3] Univ Roma Tre, Rome, Italy
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data. While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.
引用
收藏
页码:66 / 81
页数:16
相关论文
共 50 条
  • [41] Learning knowledge bases for information extraction from multiple text based web sites
    Gao, XY
    Zhang, MJ
    IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 119 - 125
  • [42] Mining Large-scale Event Knowledge from Web Text
    Cao, Ya-nan
    Zhang, Peng
    Guo, Jing
    Guo, Li
    2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2014, 29 : 478 - 487
  • [43] Refined Commonsense Knowledge From Large-Scale Web Contents
    Nguyen, Tuan-Phong
    Razniewski, Simon
    Romero, Julien
    Weikum, Gerhard
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (08) : 8431 - 8447
  • [44] A Generic Framework for Extraction of Knowledge from Social Web Sources (Social Networking Websites) for an Online Recommendation System
    Sathick, Javubar
    Venkat, Jaya
    INTERNATIONAL REVIEW OF RESEARCH IN OPEN AND DISTRIBUTED LEARNING, 2015, 16 (02): : 247 - 271
  • [45] Statistical and Structural Analysis of Web-based Collaborative Knowledge Bases Generated from Wiki Encyclopedia
    Zeng, Yi
    Wang, Hao
    Hao, Hongwei
    Xu, Bo
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 553 - 557
  • [46] Constructing large scale biomedical knowledge bases from scratch with rapid annotation of interpretable patterns
    Fauqueur, Julien
    Thillaisundaram, Ashok
    Togia, Theodosia
    SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), 2019, : 142 - 151
  • [47] Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities
    Shalaby, Walid
    Arantes, Adriano
    GonzalezDiaz, Teresa
    Gupta, Chetan
    2020 IEEE INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (ICPHM), 2020,
  • [48] Beyond word embeddings: learning entity and concept representations from large scale knowledge bases
    Walid Shalaby
    Wlodek Zadrozny
    Hongxia Jin
    Information Retrieval Journal, 2019, 22 : 525 - 542
  • [49] Beyond word embeddings: learning entity and concept representations from large scale knowledge bases
    Shalaby, Walid
    Zadrozny, Wlodek
    Jin, Hongxia
    INFORMATION RETRIEVAL JOURNAL, 2019, 22 (06): : 525 - 542
  • [50] Large-scale relation extraction from web documents and knowledge graphs with human-in-the-loop
    Ristoski, Petar
    Gentile, Anna Lisa
    Alba, Alfredo
    Gruhl, Daniel
    Welch, Steven
    JOURNAL OF WEB SEMANTICS, 2020, 60 (60):