Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

被引:10
|
作者
Dong, Xin Luna [1 ]
Hajishirzi, Hannaneh [2 ]
Lockard, Colin [3 ]
Shiralkar, Prashant [1 ]
机构
[1] Amazon, Seattle, WA 98109 USA
[2] Univ Washington, Allen Inst AI, Seattle, WA USA
[3] Univ Washington, Amazon, Seattle, WA USA
关键词
Information extraction; Web extraction; Semi-structured data; Web mining;
D O I
10.1145/3394486.3406468
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How do we surface the large amount of information present in HTML documents on the Web, from news articles to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for information extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.
引用
收藏
页码:3543 / 3544
页数:2
相关论文
共 50 条
  • [41] StrucTexT: Structured Text Understanding with Multi-Modal Transformers
    Li, Yulin
    Qian, Yuxi
    Yu, Yuechen
    Qin, Xiameng
    Zhang, Chenquan
    Liu, Yan
    Yao, Kun
    Han, Junyu
    Liu, Jingtuo
    Ding, Errui
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1912 - 1920
  • [42] Knowledge extraction from semi-structured data based on fuzzy techniques
    Ceravolo, P
    Nocerino, MC
    Viviani, M
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2004, 3215 : 328 - 334
  • [43] EGA: An algorithm for automatic semi-structured Web documents extraction
    Li, LY
    Tang, SW
    Yang, DQ
    Wang, TJ
    Su, ZH
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2004, 2973 : 787 - 798
  • [45] Self-Training for Label-Efficient Information Extraction from Semi-Structured Web-Pages
    Sarkhel, Ritesh
    Huang, Binxuan
    Lockard, Cohn
    Shiralkar, Prashant
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (11): : 3098 - 3110
  • [46] Semi-structured Data Extraction and Schema Knowledge Mining
    陈恩红
    HighTechnologyLetters, 2001, (01) : 1 - 5
  • [47] Approximate graph schema extraction for semi-structured data
    Wang, QY
    Yu, JX
    Wong, KF
    ADVANCES IN DATABSE TECHNOLOGY-EDBT 2000, PROCEEDINGS, 2000, 1777 : 302 - 316
  • [48] Semi-structured data extraction and modelling: the WIA Project
    Colombo, Gianluca
    Colombo, Ettore
    Bonomi, Andrea
    Mosca, Alessandro
    Bassis, Simone
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2013, (130): : 98 - 103
  • [49] Semi-structured data extraction and schema knowledge mining
    Chen, E.
    Wang, X.
    High Technology Letters, 2001, 7 (01) : 1 - 5
  • [50] Interoperability and semi-structured data in an open web-based agent information system
    Lu, HG
    Sterling, L
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, VOL I, 2000, : 80 - 86