Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

被引:10
|
作者
Dong, Xin Luna [1 ]
Hajishirzi, Hannaneh [2 ]
Lockard, Colin [3 ]
Shiralkar, Prashant [1 ]
机构
[1] Amazon, Seattle, WA 98109 USA
[2] Univ Washington, Allen Inst AI, Seattle, WA USA
[3] Univ Washington, Amazon, Seattle, WA USA
关键词
Information extraction; Web extraction; Semi-structured data; Web mining;
D O I
10.1145/3394486.3406468
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How do we surface the large amount of information present in HTML documents on the Web, from news articles to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for information extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.
引用
收藏
页码:3543 / 3544
页数:2
相关论文
共 50 条
  • [21] Generating finite-state transducers for semi-structured data extraction from the Web
    Hsu, CN
    Dung, MT
    INFORMATION SYSTEMS, 1998, 23 (08) : 521 - 538
  • [22] Interactive tuples extraction from semi-structured data
    Gilleron, Remi
    Marty, Patrick
    Tommasi, Marc
    Torre, Fabien
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 997 - 1004
  • [23] A strategy for extracting information from semi-structured web pages
    Shaker, Mahmoud
    Ibrahim, Hamidah
    Mustapha, Aida
    Abdullah, Lili Nurliyana
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2010, 6 (04) : 304 - 318
  • [24] Unsupervised Extraction of Product Information from Semi-structured Sources
    Walther, Maximilian
    13TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS (CINTI 2012), 2012, : 257 - 262
  • [25] Scalable Attribute-Value Extraction from Semi-Structured Text
    Wong, Yuk Wah
    Widdows, Dominic
    Lokovic, Tom
    Nigam, Kamal
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 302 - 307
  • [26] Information extraction from semi-structured data in the protein data bank by induction of a data description pattern
    Kawaguchi, Y
    Kaneta, Y
    Ohkawa, T
    Nakamura, H
    Ito, N
    METMBS'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES, 2003, : 94 - 99
  • [27] Multi-level schema extraction for heterogeneous semi-structured data
    Yoon, JP
    Raghavan, V
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2000, 1846 : 411 - 422
  • [28] CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web
    Lockard, Colin
    Dong, Xin Luna
    Einolghozati, Arash
    Shiralkar, Prashant
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (10): : 1084 - 1096
  • [29] Building web warehouse for semi-structured data
    Mohania, M
    DATA & KNOWLEDGE ENGINEERING, 2001, 39 (02) : 101 - 103
  • [30] List data extraction in semi-structured document
    Xu, H
    Li, JZ
    Xu, P
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2005, 2005, 3806 : 584 - 585