Multimodal Learning for Web Information Extraction

被引：8

作者：

Gong, Dihong ^{[1
]}

Wang, Daisy Zhe ^{[1
]}

Peng, Yang ^{[1
]}

机构：

[1] Univ Florida, Gainesville, FL 32611 USA

来源：

PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17) | 2017年

基金：

美国国家科学基金会;

关键词：

Multimodal; Information Extraction; Web Mining; LARGE-SCALE;

D O I：

10.1145/3123266.3123296

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We consider the problem of extracting text instances of predefined categories (e.g. city and person) from the Web. Instances of a category may be scattered across thousands of independent sources in many different formats with potential noises, which makes open-domain information extraction a challenging problem. Learning syntactic rules like "cities such as_ " or "_ is a city" in a semi-supervised manner using a few labeled examples is usually unreliable because 1) high quality syntactic rules are rare and 2) the learning task is usually underconstrained. To address these problems, in this paper we propose to learn multimodal rules to combat the difficulty of syntactic rules. The multimodal rules are learned from information sources of different modalities, which is motivated by an intuition that information that is difficult to disambiguate correctly in one modality may be easily recognized in another. To demonstrate the effectiveness of this method, we have built a sophisticated end-to-end multimodal information extraction system that takes unannotated raw web pages as input, and generates a set of extracted instances (e.g. Boston is an instance of city) as outputs. More specifically, our system learns reliable relationship between multimodal information by multimodal relation analysis on big unstructured data. Based on the learned relationship, we further train a set of multimodal rules for information extraction. Experimental evaluation shows that a greater accuracy for information extraction can be achieved by multimodal learning.

引用

页码：288 / 296

页数：9

共 50 条

[41] Deep Multimodal Learning for Information Retrieval
Ji, Wei
Wei, Yinwei
Zheng, Zhedong
Fei, Hao
Chua, Tat-Seng
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9739 - 9741
[42] Learning (k, l)-contextual tree languages for information extraction from web pages
Raeymaekers, Stefan
Bruynooghe, Maurice
Van den Bussche, Jan
MACHINE LEARNING, 2008, 71 (2-3) : 155 - 183
[43] Learning information extraction patterns from tabular web pages without manual labelling
Gao, XY
Zhang, MJ
Andreae, P
IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 495 - 498
[44] Learning (k,l)-contextual tree languages for information extraction from web pages
Stefan Raeymaekers
Maurice Bruynooghe
Jan Van den Bussche
Machine Learning, 2008, 71 : 155 - 183
[45] Learning knowledge bases for information extraction from multiple text based web sites
Gao, XY
Zhang, MJ
IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 119 - 125
[46] A Semi-supervised Learning Algorithm for Web Information Extraction with Tolerance Rough Sets
Sengoz, Cenker
Ramanna, Sheela
ACTIVE MEDIA TECHNOLOGY, AMT 2014, 2014, 8610 : 1 - 10
[47] Multimodal learning on graphs for disease relation extraction
Lin, Yucong
Lu, Keming
Yu, Sheng
Cai, Tianxi
Zitnik, Marinka
JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 143
[48] Web Page Segmentation Towards Information Extraction for Web Semantics
Malhotra, Pooja
Malik, Sanjay Kumar
INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 431 - 442
[49] PSO: A language for Web information extraction and Web page clipping
Suzuki, T
Tokuda, T
ADAPTIVE HYPERMEDIA AND ADAPTIVE WEB-BASED SYSTEMS, PROCEEDINGS, 2004, 3137 : 332 - 335
[50] UMIE: Unified Multimodal Information Extraction with Instruction Tuning
Sun, Lin
Zhang, Kai
Li, Qingyuan
Lou, Renze
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19062 - 19070

← 1 2 3 4 5 →