Multimodal Learning for Web Information Extraction

被引：8

作者：

Gong, Dihong ^{[1
]}

Wang, Daisy Zhe ^{[1
]}

Peng, Yang ^{[1
]}

机构：

[1] Univ Florida, Gainesville, FL 32611 USA

来源：

PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17) | 2017年

基金：

美国国家科学基金会;

关键词：

Multimodal; Information Extraction; Web Mining; LARGE-SCALE;

D O I：

10.1145/3123266.3123296

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We consider the problem of extracting text instances of predefined categories (e.g. city and person) from the Web. Instances of a category may be scattered across thousands of independent sources in many different formats with potential noises, which makes open-domain information extraction a challenging problem. Learning syntactic rules like "cities such as_ " or "_ is a city" in a semi-supervised manner using a few labeled examples is usually unreliable because 1) high quality syntactic rules are rare and 2) the learning task is usually underconstrained. To address these problems, in this paper we propose to learn multimodal rules to combat the difficulty of syntactic rules. The multimodal rules are learned from information sources of different modalities, which is motivated by an intuition that information that is difficult to disambiguate correctly in one modality may be easily recognized in another. To demonstrate the effectiveness of this method, we have built a sophisticated end-to-end multimodal information extraction system that takes unannotated raw web pages as input, and generates a set of extracted instances (e.g. Boston is an instance of city) as outputs. More specifically, our system learns reliable relationship between multimodal information by multimodal relation analysis on big unstructured data. Based on the learned relationship, we further train a set of multimodal rules for information extraction. Experimental evaluation shows that a greater accuracy for information extraction can be achieved by multimodal learning.

引用

页码：288 / 296

页数：9

共 50 条

[21] WEB INFORMATION EXTRACTION AND ITS APPLICATION
Peng, Yan
Zhang, Chenyue
2011 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS, 2011, : 448 - 451
[22] A survey of web information extraction systems
Chang, Chia-Hui
Kayed, Mohammed
Girgis, Moheb Ramzy
Shaalan, Khaled F.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (10) : 1411 - 1428
[23] A Classification Method for Web Information Extraction
LI Xiang-yang 1
2. Department of Computer Science and Engineering
Wuhan University Journal of Natural Sciences, 2004, (05) : 823 - 827
[24] Services orchestration for web information extraction
Quafafou, Mohamed
Jarir, Zahi
Erradi, Mohammed
NWESP 2007: THIRD INTERNATIONAL CONFERENCE ON NEXT GENERATION WEB SERVICES PRACTICES, PROCEEDINGS, 2007, : 85 - +
[25] Web information extraction by competing classification
Li, Xiang-Yang
Lu, Jian-Jiang
Zhang, Ya-Fei
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2004, 32 (11): : 1915 - 1917
[26] A hybrid approach for web information extraction
Xiao, Ji-Yi
Zhu, Dao-Hui
Zou, La-Mei
PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 1560 - 1563
[27] Building web information extraction tasks
Habegger, B
Quafafou, M
IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 349 - 355
[28] Extraction and Comparison of Tourism Information on the Web
Wu, Xiaobin
Hirokawa, Sachio
Yin, Chengjiu
Nakatoh, Tetsuya
Tabata, Yoshiyuki
PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 16TH '11), 2011, : 170 - 173
[29] Web Information Extraction and Conversion for Mashup
Zhang, Rui
Lan, Xiang
Liu, Yao
Liu, Qingyang
MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 5471 - 5476
[30] Open Information Extraction from the Web
Banko, Michele
Cafarella, Michael J.
Soderland, Stephen
Broadhead, Matt
Etzioni, Oren
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2670 - 2676

← 1 2 3 4 5 →