Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

被引:10
|
作者
Dong, Xin Luna [1 ]
Hajishirzi, Hannaneh [2 ]
Lockard, Colin [3 ]
Shiralkar, Prashant [1 ]
机构
[1] Amazon, Seattle, WA 98109 USA
[2] Univ Washington, Allen Inst AI, Seattle, WA USA
[3] Univ Washington, Amazon, Seattle, WA USA
关键词
Information extraction; Web extraction; Semi-structured data; Web mining;
D O I
10.1145/3394486.3406468
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How do we surface the large amount of information present in HTML documents on the Web, from news articles to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for information extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.
引用
收藏
页码:3543 / 3544
页数:2
相关论文
共 50 条
  • [1] Information extraction from semi-structured web documents
    Yun, Bo-Hyun
    Seo, Chang-Ho
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2006, 4092 : 586 - 598
  • [2] Interactive Data Extraction from Semi-Structured Text
    Broman, Per
    Thalheim, Bernhard
    INFORMATION MODELLING AND KNOWLEDGE BASES XXIII, 2012, 237 : 1 - 19
  • [3] Information extraction from Web pages using semi-structured data alignment
    Kuboyama, Tetsuji
    Miyahara, Tetsuhiro
    Hirokawa, Sachio
    Itou, Eisuke
    WMSCI 2005: 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol 1, 2005, : 42 - 47
  • [4] Bootstrapping Information Extraction from Semi-structured Web Pages
    Carlson, Andrew
    Schafer, Charles
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART I, PROCEEDINGS, 2008, 5211 : 195 - +
  • [5] Data extraction from semi-structured web pages by clustering
    Vuong, Le Phong Bao
    Gao, Xiaoying
    Zhang, Mengjie
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 374 - +
  • [6] Web Service for Data Extraction from Semi-structured Data Sources
    Yashina, Marina V.
    Nakonechnyy, Ivan I.
    PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON DEPENDABILITY AND COMPLEX SYSTEMS DEPCOS-RELCOMEX, 2014, 286 : 499 - 510
  • [7] Learning information extraction rules for semi-structured and free text
    Soderland, S
    MACHINE LEARNING, 1999, 34 (1-3) : 233 - 272
  • [8] Chinese resume information extraction based on semi-structured text
    Wentan, Yan
    Yupeng, Qiao
    Chinese Control Conference, CCC, 2017, : 11177 - 11182
  • [9] Chinese resume information extraction based on semi-structured text
    Yan Wentan
    Qiao Yupeng
    PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 11177 - 11182
  • [10] Learning Information Extraction Rules for Semi-Structured and Free Text
    Stephen Soderland
    Machine Learning, 1999, 34 : 233 - 272