Using clustering for web information extraction

被引：0

作者：

Phong, Le ^{[1
]}

Vuong, Bao ^{[1
]}

Gao, Xiaoying ^{[1
]}

机构：

[1] Victoria Univ Wellington, Sch Math Stat & Comp Sci, POB 600, Wellington, New Zealand

来源：

AI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS | 2007年 / 4830卷

关键词：

information extraction; clustering; Smith-Waterman algorithm;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces an approach that achieves automated data extraction from semi-structured Web pages by clustering. Both HTML tags and the textual features of text tokens are considered for similarity comparison. The first clustering process groups similar text tokens into the same text clusters, and the second clustering process groups similar data tuples into tuple clusters. A tuple cluster is a strong candidate of a repetitive data region.

引用

页码：415 / +

页数：2

共 50 条

[1] Clustering Web Documents with Tables for Information Extraction
Shchekotykhin, Kostyantyn
Jannach, Dietmar
Friedrich, Gerhard
K-CAP'07: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE, 2007, : 169 - 170
[2] Web Information Extraction Based on Clustering GHMM
Liu, Yongxin
Liu, Zhijng
PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN, VOL 1, 2008, : 545 - 548
[3] Web Content Extraction Using Clustering with Web Structure
Huang, Xiaotao
Gao, Yan
Huang, Liqun
Zhang, Zhizhao
Li, Yuhua
Wang, Fen
Kang, Ling
ADVANCES IN NEURAL NETWORKS, PT I, 2017, 10261 : 95 - 103
[4] Using keyword extraction for Web site clustering
Tonella, P
Ricca, F
Pianta, E
Girardi, C
FIFTH IEEE INTERNATIONAL WORKSHOP ON WEB SITE EVOLUTION THEME: ARCHITECTURE, PROCEEDINGS, 2003, : 41 - 48
[5] A Method of Automatic Web Information Extraction Based on Page Clustering
Yang, Tianqi
Qiu, Taofen
2011 9TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2011), 2011, : 390 - 393
[6] Web document clustering by using automatic keyphrase extraction
Flan, Juhyun
Kim, Taehwan
Choi, Joongmin
PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 56 - 59
[7] The Ex Project: Web Information Extraction Using Extraction Ontologies
Labsky, Martin
Svatek, Vojtech
Nekvasil, Marek
Rak, Dusan
KNOWLEDGE DISCOVERY ENHANCED WITH SEMANTIC AND SOCIAL INFORMATION, 2009, 220 : 71 - 88
[8] CLUSTERING WEB SEARCH RESULTS USING SEMANTIC INFORMATION
Wen, Han
Huang, Guo-Shun
Li, Zhao
PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 1504 - +
[9] A Clustering Framework to Build Focused Web Crawlers for Automatic Extraction of Cultural Information
Tsekouras, George E.
Gavalas, Damianos
Filios, Stefanos
Niros, Antonios D.
Bafaloukas, George
ARTIFICIAL INTELLIGENCE: THEORIES, MODELS AND APPLICATIONS, SETN 2008, 2008, 5138 : 419 - 424
[10] STAVIES: A system for information extraction from unknown Web data sources through automatic Web wrapper generation using clustering techniques
Papadakis, NK
Skoutas, D
Raftopoulos, K
Varvarigou, TA
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (12) : 1638 - 1652

← 1 2 3 4 5 →