Multimodal Learning for Web Information Extraction

被引:8
|
作者
Gong, Dihong [1 ]
Wang, Daisy Zhe [1 ]
Peng, Yang [1 ]
机构
[1] Univ Florida, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
Multimodal; Information Extraction; Web Mining; LARGE-SCALE;
D O I
10.1145/3123266.3123296
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the problem of extracting text instances of predefined categories (e.g. city and person) from the Web. Instances of a category may be scattered across thousands of independent sources in many different formats with potential noises, which makes open-domain information extraction a challenging problem. Learning syntactic rules like "cities such as_ " or "_ is a city" in a semi-supervised manner using a few labeled examples is usually unreliable because 1) high quality syntactic rules are rare and 2) the learning task is usually underconstrained. To address these problems, in this paper we propose to learn multimodal rules to combat the difficulty of syntactic rules. The multimodal rules are learned from information sources of different modalities, which is motivated by an intuition that information that is difficult to disambiguate correctly in one modality may be easily recognized in another. To demonstrate the effectiveness of this method, we have built a sophisticated end-to-end multimodal information extraction system that takes unannotated raw web pages as input, and generates a set of extracted instances (e.g. Boston is an instance of city) as outputs. More specifically, our system learns reliable relationship between multimodal information by multimodal relation analysis on big unstructured data. Based on the learned relationship, we further train a set of multimodal rules for information extraction. Experimental evaluation shows that a greater accuracy for information extraction can be achieved by multimodal learning.
引用
收藏
页码:288 / 296
页数:9
相关论文
共 50 条
  • [1] MUSTIE: Multimodal Structural Transformer for Web Information Extraction
    Wang, Qifan
    Wang, Jingang
    Quan, Xiaojun
    Feng, Fuli
    Xu, Zenglin
    Nie, Shaoliang
    Wang, Sinong
    Khabsa, Madian
    Firooz, Hamed
    Liu, Dongfang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2405 - 2420
  • [2] On learning web information extraction rules with TANGO
    Jimenez, Patricia
    Corchuelo, Rafael
    INFORMATION SYSTEMS, 2016, 62 : 74 - 103
  • [3] Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction
    Wang, Yansen
    Fan, Zhen
    Rose, Carolyn P.
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1790 - 1800
  • [4] Learning logic wrappers for information extraction from the Web
    Badica, C
    Popescu, E
    Badica, A
    2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS, 2005, : 336 - 339
  • [5] Syntactic Representation Learning for Open Information Extraction on Web
    Ru, Chengsen
    Tang, Jintao
    Li, Shasha
    Wang, Ting
    WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 833 - 834
  • [6] Web Page Information Extraction System by Using Deep Learning
    Pakyurek, Muhammet
    Sezgin, Mehmet Selman
    Kulac, Selman
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 145 - 149
  • [7] Web Page Information Extraction System by Using Deep Learning
    Pakyurek, Muhammet
    Sezgin, Mehmet Selman
    Kulac, Selman
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 361 - 365
  • [8] A Conceptual Design of a Web Information Extraction and Data Analysis Learning Framework
    Tseng, Chun-Hsiung
    Chen, Yung-Hui
    Jiang, Yan-Ru
    2015 8TH INTERNATIONAL CONFERENCE ON UBI-MEDIA COMPUTING (UMEDIA) CONFERENCE PROCEEDINGS, 2015, : 124 - 127
  • [9] Research on method of learning web information extraction rule based on XPATH
    Hu, Yan
    Xuan, Yanyan
    DCABES 2007 PROCEEDINGS, VOLS I AND II, 2007, : 897 - 899
  • [10] Web Services for information extraction from the Web
    Habegger, B
    Quafafou, M
    IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, PROCEEDINGS, 2004, : 279 - 286