Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

被引:10
|
作者
Dong, Xin Luna [1 ]
Hajishirzi, Hannaneh [2 ]
Lockard, Colin [3 ]
Shiralkar, Prashant [1 ]
机构
[1] Amazon, Seattle, WA 98109 USA
[2] Univ Washington, Allen Inst AI, Seattle, WA USA
[3] Univ Washington, Amazon, Seattle, WA USA
关键词
Information extraction; Web extraction; Semi-structured data; Web mining;
D O I
10.1145/3394486.3406468
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How do we surface the large amount of information present in HTML documents on the Web, from news articles to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for information extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.
引用
收藏
页码:3543 / 3544
页数:2
相关论文
共 50 条
  • [31] Gathering services of IHWA from semi-structured web information sources
    Jeong, JS
    Oh, DI
    IC'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET COMPUTING, VOLS I AND II, 2001, : 375 - 378
  • [32] Detecting data records in semi-structured web sites based on text token clustering
    Gao, Xiaoying
    Vuong, Le Phong Bao
    Zhang, Mengjie
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2008, 15 (04) : 297 - 311
  • [33] Reverse method for labeling the information from semi-structured web pages
    Akbar, Z.
    Handoko, L. T.
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2009, : 551 - 555
  • [34] Ontology Construction from Semi-Structured Text
    Zhou, Kuanjiu
    Wang, Lei
    Qiu, Peng
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 10936 - 10939
  • [35] MULTI-MODAL TRAVEL INFORMATION ON THE WEB
    Pun-Cheng, Lilian S. C.
    Shea, Geoffrey Y. K.
    Mok, Esmond C. M.
    TRANSPORTATION AND LOGISTICS, 2003, : 285 - 290
  • [36] A real time data extraction, transformation and loading solution for semi-structured text files
    Viana, N
    Raminhos, R
    Moura-Pires, J
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3808 : 383 - 394
  • [37] WICCAO: From semi-structured data to structured data
    Li, Z
    Ng, WK
    11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOP ON THE ENGINEERING OF COMPUTER-BASED SYSTEMS, PROCEEDINGS, 2004, : 86 - 93
  • [38] Extracting lists of data records from semi-structured web pages
    Alvarez, Manuel
    Pan, Alberto
    Raposo, Juan
    Bellas, Fernando
    Cacheda, Fidel
    DATA & KNOWLEDGE ENGINEERING, 2008, 64 (02) : 491 - 509
  • [39] WebDB: a system for querying semi-structured data on the Web
    Li, WS
    Shim, J
    Candan, KS
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2002, 13 (01): : 3 - 33
  • [40] A strategy for data storage and the search for semi-structured data in the Web
    do Nascimento, C. A. S. A.
    Ebecken, N. F. F.
    Rosa, J. L. dos A.
    DATA MINING X: DATA MINING, PROTECTION, DETECTION AND OTHER SECURITY TECHNOLOGIES, 2009, 42 : 51 - +