Multimodal Learning for Web Information Extraction

被引:8
|
作者
Gong, Dihong [1 ]
Wang, Daisy Zhe [1 ]
Peng, Yang [1 ]
机构
[1] Univ Florida, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
Multimodal; Information Extraction; Web Mining; LARGE-SCALE;
D O I
10.1145/3123266.3123296
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the problem of extracting text instances of predefined categories (e.g. city and person) from the Web. Instances of a category may be scattered across thousands of independent sources in many different formats with potential noises, which makes open-domain information extraction a challenging problem. Learning syntactic rules like "cities such as_ " or "_ is a city" in a semi-supervised manner using a few labeled examples is usually unreliable because 1) high quality syntactic rules are rare and 2) the learning task is usually underconstrained. To address these problems, in this paper we propose to learn multimodal rules to combat the difficulty of syntactic rules. The multimodal rules are learned from information sources of different modalities, which is motivated by an intuition that information that is difficult to disambiguate correctly in one modality may be easily recognized in another. To demonstrate the effectiveness of this method, we have built a sophisticated end-to-end multimodal information extraction system that takes unannotated raw web pages as input, and generates a set of extracted instances (e.g. Boston is an instance of city) as outputs. More specifically, our system learns reliable relationship between multimodal information by multimodal relation analysis on big unstructured data. Based on the learned relationship, we further train a set of multimodal rules for information extraction. Experimental evaluation shows that a greater accuracy for information extraction can be achieved by multimodal learning.
引用
收藏
页码:288 / 296
页数:9
相关论文
共 50 条
  • [21] WEB INFORMATION EXTRACTION AND ITS APPLICATION
    Peng, Yan
    Zhang, Chenyue
    2011 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS, 2011, : 448 - 451
  • [22] A survey of web information extraction systems
    Chang, Chia-Hui
    Kayed, Mohammed
    Girgis, Moheb Ramzy
    Shaalan, Khaled F.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (10) : 1411 - 1428
  • [23] A Classification Method for Web Information Extraction
    LI Xiang-yang 1
    2. Department of Computer Science and Engineering
    Wuhan University Journal of Natural Sciences, 2004, (05) : 823 - 827
  • [24] Services orchestration for web information extraction
    Quafafou, Mohamed
    Jarir, Zahi
    Erradi, Mohammed
    NWESP 2007: THIRD INTERNATIONAL CONFERENCE ON NEXT GENERATION WEB SERVICES PRACTICES, PROCEEDINGS, 2007, : 85 - +
  • [25] Web information extraction by competing classification
    Li, Xiang-Yang
    Lu, Jian-Jiang
    Zhang, Ya-Fei
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2004, 32 (11): : 1915 - 1917
  • [26] A hybrid approach for web information extraction
    Xiao, Ji-Yi
    Zhu, Dao-Hui
    Zou, La-Mei
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 1560 - 1563
  • [27] Building web information extraction tasks
    Habegger, B
    Quafafou, M
    IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 349 - 355
  • [28] Extraction and Comparison of Tourism Information on the Web
    Wu, Xiaobin
    Hirokawa, Sachio
    Yin, Chengjiu
    Nakatoh, Tetsuya
    Tabata, Yoshiyuki
    PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 16TH '11), 2011, : 170 - 173
  • [29] Web Information Extraction and Conversion for Mashup
    Zhang, Rui
    Lan, Xiang
    Liu, Yao
    Liu, Qingyang
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 5471 - 5476
  • [30] Open Information Extraction from the Web
    Banko, Michele
    Cafarella, Michael J.
    Soderland, Stephen
    Broadhead, Matt
    Etzioni, Oren
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2670 - 2676