Web-Scale Extension of RDF Knowledge Bases from Templated Websites

被引:0
|
作者
Buehmann, Lorenz [1 ]
Usbeck, Ricardo [1 ,2 ]
Ngomo, Axel-Cyrille Ngonga [1 ]
Saleem, Muhammad [1 ]
Both, Andreas [2 ]
Crescenzi, Valter [3 ]
Merialdo, Paolo [3 ]
Qiu, Disheng [3 ]
机构
[1] Univ Leipzig, IFI AKSW, Leipzig, Germany
[2] Unister GmbH, Leipzig, Germany
[3] Univ Roma Tre, Rome, Italy
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data. While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.
引用
收藏
页码:66 / 81
页数:16
相关论文
共 50 条
  • [21] Unifying Web-Scale Search and Reasoning from the Viewpoint of Granularity
    Zeng, Yi
    Wang, Yan
    Huang, Zhisheng
    Zhong, Ning
    ACTIVE MEDIA TECHNOLOGY, PROCEEDINGS, 2009, 5820 : 418 - +
  • [22] GeLoGo: Detecting TV logos from Web-Scale Videos
    Ye, Qiting
    Luo, Zhao
    Xiao, Xiaobing
    Ge, Shiming
    2017 IEEE THIRD INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2017), 2017, : 250 - 251
  • [23] Populating Web-Scale Knowledge Graphs Using Distantly Supervised Relation Extraction and Validation
    Dash, Sarthak
    Glass, Michael R.
    Gliozzo, Alfio
    Canim, Mustafa
    Rossiello, Gaetano
    INFORMATION, 2021, 12 (08)
  • [24] Gender and animacy knowledge discovery from web-scale n-grams for unsupervised person mention detection
    Ji, Heng
    Lin, Dekang
    PACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009, 1 : 220 - 229
  • [25] Learning Query and Document Relevance from a Web-scale Click Graph
    Jiang, Shan
    Hu, Yuening
    Kang, Changsung
    Daly, Tim, Jr.
    Yin, Dawei
    Chang, Yi
    Zhai, Chengxiang
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 185 - 194
  • [26] Weakly Supervised Learning of Object Segmentations from Web-Scale Video
    Hartmann, Glenn
    Grundmann, Matthias
    Hoffman, Judy
    Tsai, David
    Kwatra, Vivek
    Madani, Omid
    Vijayanarasimhan, Sudheendra
    Essa, Irfan
    Rehg, James
    Sukthankar, Rahul
    COMPUTER VISION - ECCV 2012: WORKSHOPS AND DEMONSTRATIONS, PT I, 2012, 7583 : 198 - 208
  • [27] Drinking From a Firehose: Continual Learning With Web-Scale Natural Language
    Hu, Hexiang
    Sener, Ozan
    Sha, Fei
    Koltun, Vladlen
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 5684 - 5696
  • [28] Statistical Induction of Coupled Domain/Range Restrictions from RDF Knowledge Bases
    Ell, Basil
    Hakimov, Sherzod
    Cimiano, Philipp
    KNOWLEDGE GRAPHS AND LANGUAGE TECHNOLOGY, 2017, 10579 : 27 - 40
  • [29] Web-KR 2013: The 4th International Workshop on Web-scale Knowledge Representation, Retrieval and Reasoning
    Zeng, Yi
    Kotoulas, Spyros
    Huang, Zhisheng
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013,
  • [30] Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
    Panchenko, Alexander
    Ruppert, Eugen
    Faralli, Stefano
    Ponzetto, Simone P.
    Biemann, Chris
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1816 - 1823