Multimodal Learning for Web Information Extraction

被引:8
|
作者
Gong, Dihong [1 ]
Wang, Daisy Zhe [1 ]
Peng, Yang [1 ]
机构
[1] Univ Florida, Gainesville, FL 32611 USA
来源
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17) | 2017年
基金
美国国家科学基金会;
关键词
Multimodal; Information Extraction; Web Mining; LARGE-SCALE;
D O I
10.1145/3123266.3123296
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the problem of extracting text instances of predefined categories (e.g. city and person) from the Web. Instances of a category may be scattered across thousands of independent sources in many different formats with potential noises, which makes open-domain information extraction a challenging problem. Learning syntactic rules like "cities such as_ " or "_ is a city" in a semi-supervised manner using a few labeled examples is usually unreliable because 1) high quality syntactic rules are rare and 2) the learning task is usually underconstrained. To address these problems, in this paper we propose to learn multimodal rules to combat the difficulty of syntactic rules. The multimodal rules are learned from information sources of different modalities, which is motivated by an intuition that information that is difficult to disambiguate correctly in one modality may be easily recognized in another. To demonstrate the effectiveness of this method, we have built a sophisticated end-to-end multimodal information extraction system that takes unannotated raw web pages as input, and generates a set of extracted instances (e.g. Boston is an instance of city) as outputs. More specifically, our system learns reliable relationship between multimodal information by multimodal relation analysis on big unstructured data. Based on the learned relationship, we further train a set of multimodal rules for information extraction. Experimental evaluation shows that a greater accuracy for information extraction can be achieved by multimodal learning.
引用
收藏
页码:288 / 296
页数:9
相关论文
共 50 条
  • [41] Deep Multimodal Learning for Information Retrieval
    Ji, Wei
    Wei, Yinwei
    Zheng, Zhedong
    Fei, Hao
    Chua, Tat-Seng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9739 - 9741
  • [42] Learning (k, l)-contextual tree languages for information extraction from web pages
    Raeymaekers, Stefan
    Bruynooghe, Maurice
    Van den Bussche, Jan
    MACHINE LEARNING, 2008, 71 (2-3) : 155 - 183
  • [43] Learning information extraction patterns from tabular web pages without manual labelling
    Gao, XY
    Zhang, MJ
    Andreae, P
    IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 495 - 498
  • [44] Learning (k,l)-contextual tree languages for information extraction from web pages
    Stefan Raeymaekers
    Maurice Bruynooghe
    Jan Van den Bussche
    Machine Learning, 2008, 71 : 155 - 183
  • [45] Learning knowledge bases for information extraction from multiple text based web sites
    Gao, XY
    Zhang, MJ
    IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 119 - 125
  • [46] A Semi-supervised Learning Algorithm for Web Information Extraction with Tolerance Rough Sets
    Sengoz, Cenker
    Ramanna, Sheela
    ACTIVE MEDIA TECHNOLOGY, AMT 2014, 2014, 8610 : 1 - 10
  • [47] Multimodal learning on graphs for disease relation extraction
    Lin, Yucong
    Lu, Keming
    Yu, Sheng
    Cai, Tianxi
    Zitnik, Marinka
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 143
  • [48] Web Page Segmentation Towards Information Extraction for Web Semantics
    Malhotra, Pooja
    Malik, Sanjay Kumar
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 431 - 442
  • [49] PSO: A language for Web information extraction and Web page clipping
    Suzuki, T
    Tokuda, T
    ADAPTIVE HYPERMEDIA AND ADAPTIVE WEB-BASED SYSTEMS, PROCEEDINGS, 2004, 3137 : 332 - 335
  • [50] UMIE: Unified Multimodal Information Extraction with Instruction Tuning
    Sun, Lin
    Zhang, Kai
    Li, Qingyuan
    Lou, Renze
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19062 - 19070