Using clustering for web information extraction

被引：0

作者：

Phong, Le ^{[1
]}

Vuong, Bao ^{[1
]}

Gao, Xiaoying ^{[1
]}

机构：

[1] Victoria Univ Wellington, Sch Math Stat & Comp Sci, POB 600, Wellington, New Zealand

来源：

AI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS | 2007年 / 4830卷

关键词：

information extraction; clustering; Smith-Waterman algorithm;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces an approach that achieves automated data extraction from semi-structured Web pages by clustering. Both HTML tags and the textual features of text tokens are considered for similarity comparison. The first clustering process groups similar text tokens into the same text clusters, and the second clustering process groups similar data tuples into tuple clusters. A tuple cluster is a strong candidate of a repetitive data region.

引用

页码：415 / +

页数：2

共 50 条

[31] Clustering for Web information hierarchy mining
Kao, HY
Ho, JM
Chen, MS
IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 698 - 701
[32] Clustering in User Information Retrieval on Web
Sharma, Sachin
Mangat, Veenu
2013 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2013, : 287 - 290
[33] Web Services for a Chemical Information Clustering
Kim, Jungkee
2011 6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND CONVERGENCE INFORMATION TECHNOLOGY (ICCIT), 2012, : 140 - 143
[34] NLP and Ontology based Clustering - An Integrated approach for Optimal Information Extraction from Social Web
Dhuria, Shabina
Taneja, Harmunish
Taneja, Kavita
PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 1765 - 1770
[35] The Web-OEM approach to Web information extraction
Iocchi, L
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 1999, 22 (04) : 259 - 269
[36] Extraction and organization of encyclopedic knowledge information using the World Wide Web
Fujii, Atsushi
Ishikawa, Tetsuya
Systems and Computers in Japan, 2005, 36 (14): : 81 - 90
[37] PIES: A web information extraction system using ontology and tag patterns
Park, BK
Han, H
Song, IY
ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2005, 3739 : 688 - 693
[38] Using common schemas for information extraction from heterogeneous Web catalogs
Vlach, R
Kazakos, W
ADVANCES IN DATABASES AND INFORMATION SYSTEMS, PROCEEDINGS, 2003, 2798 : 118 - 132
[39] Blog post and comment extraction using information quantity of web format
Cao, Donglin
Liao, Xiangwen
Xu, Hongbo
Bai, Shuo
INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 298 - 309
[40] Extraction of Context Information from Web Content Using Entity Linking
Hirata, Norifumi
Shiramatsu, Shun
Ozono, Tadachika
Shintani, Toramatsu
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (02): : 18 - 23

← 1 2 3 4 5 →