Extracting Topic Maps from Web Pages

被引：0

作者：

Mase, Motohiro ^{[1
]}

Yamada, Seiji ^{[2
]}

Nitta, Katsumi ^{[1
]}

机构：

[1] Tokyo Inst Technol, Tokyo, Japan

[2] Natl Inst Informat, Tokyo, Japan

来源：

NEW FRONTIERS IN APPLIED DATA MINING | 2009年 / 5433卷

关键词：

Web information extraction; Topic Maps; clustering;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a framework to extract topic maps from a set of Web pages. We use the clustering method with the Web pages and extract the topic map prototypes. We introduced the follwing two points to the existing clustering method: The relevance is based on the types of links with directories in Web sites structure and the distance between the directories in which the pages are located. We generate the topic map prototypes from the results of the clustering. Finally, users complete the prototype by labeing the topics and associations and removing the unnecessary items. For this paper, at the first step, we mounted the proposed clustering method and extracted the prototype with the method.

引用

页码：169 / +

页数：3

共 50 条

[21] Extracting news text from web pages: an application for the visually impaired
Lundgren, Erik
Papapetrou, Panagiotis
Asker, Lars
8TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS (PETRA 2015), 2015,
[22] Topic detection and tracking for news web pages
Mori, Masaki
Miura, Takao
Shioya, Isamu
2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 338 - +
[23] Extracting Opinions Relating to Consumer Electronic Goods from Web Pages
Nakamura, Taichi
Maruyama, Hiroshi
KNOWLEDGE-BASED SOFTWARE ENGINEERING, 2006, 140 : 206 - 209
[24] Learning page-independent heuristics for extracting data from Web pages
Cohen, William W.
Fan, Wei
Computer Networks, 1999, 31 (11): : 1641 - 1652
[25] Extracting Content for News Web Pages based on DOM
Geng, Hua
Gao, Qiang
Pan, Jingui
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (02): : 124 - 129
[26] Improving the web text content by extracting significant pages into a Web Site
Ríos, SA
Velásquez, JD
Vera, ES
Yasuda, H
Aoki, T
5th International Conference on Intelligent Systems Design and Applications, Proceedings, 2005, : 32 - 36
[27] Learning page-independent heuristics for extracting data from Web pages
Cohen, WW
Fan, W
COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 1999, 31 (11-16): : 1641 - 1652
[28] Extracting lists of data records from semi-structured web pages
Alvarez, Manuel
Pan, Alberto
Raposo, Juan
Bellas, Fernando
Cacheda, Fidel
DATA & KNOWLEDGE ENGINEERING, 2008, 64 (02) : 491 - 509
[29] Learning page-independent heuristics for extracting data from Web pages
Cohen, WW
Fan, W
PROCEEDINGS OF THE EIGHTH INTERNATIONAL WORLD WIDE WEB CONFERENCE, 1999, : 563 - 574
[30] A novel method for extracting information from web pages with multiple presentation templates
Qingzhong L.
Yanhui D.
An F.
Yongquan D.
Journal of Software, 2010, 5 (05) : 506 - 513

← 1 2 3 4 5 →