Extracting Topic Maps from Web Pages

被引:0
|
作者
Mase, Motohiro [1 ]
Yamada, Seiji [2 ]
Nitta, Katsumi [1 ]
机构
[1] Tokyo Inst Technol, Tokyo, Japan
[2] Natl Inst Informat, Tokyo, Japan
来源
关键词
Web information extraction; Topic Maps; clustering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a framework to extract topic maps from a set of Web pages. We use the clustering method with the Web pages and extract the topic map prototypes. We introduced the follwing two points to the existing clustering method: The relevance is based on the types of links with directories in Web sites structure and the distance between the directories in which the pages are located. We generate the topic map prototypes from the results of the clustering. Finally, users complete the prototype by labeing the topics and associations and removing the unnecessary items. For this paper, at the first step, we mounted the proposed clustering method and extracted the prototype with the method.
引用
收藏
页码:169 / +
页数:3
相关论文
共 50 条
  • [21] Extracting news text from web pages: an application for the visually impaired
    Lundgren, Erik
    Papapetrou, Panagiotis
    Asker, Lars
    8TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS (PETRA 2015), 2015,
  • [22] Topic detection and tracking for news web pages
    Mori, Masaki
    Miura, Takao
    Shioya, Isamu
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 338 - +
  • [23] Extracting Opinions Relating to Consumer Electronic Goods from Web Pages
    Nakamura, Taichi
    Maruyama, Hiroshi
    KNOWLEDGE-BASED SOFTWARE ENGINEERING, 2006, 140 : 206 - 209
  • [24] Learning page-independent heuristics for extracting data from Web pages
    Cohen, William W.
    Fan, Wei
    Computer Networks, 1999, 31 (11): : 1641 - 1652
  • [25] Extracting Content for News Web Pages based on DOM
    Geng, Hua
    Gao, Qiang
    Pan, Jingui
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (02): : 124 - 129
  • [26] Improving the web text content by extracting significant pages into a Web Site
    Ríos, SA
    Velásquez, JD
    Vera, ES
    Yasuda, H
    Aoki, T
    5th International Conference on Intelligent Systems Design and Applications, Proceedings, 2005, : 32 - 36
  • [27] Learning page-independent heuristics for extracting data from Web pages
    Cohen, WW
    Fan, W
    COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 1999, 31 (11-16): : 1641 - 1652
  • [28] Extracting lists of data records from semi-structured web pages
    Alvarez, Manuel
    Pan, Alberto
    Raposo, Juan
    Bellas, Fernando
    Cacheda, Fidel
    DATA & KNOWLEDGE ENGINEERING, 2008, 64 (02) : 491 - 509
  • [29] Learning page-independent heuristics for extracting data from Web pages
    Cohen, WW
    Fan, W
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL WORLD WIDE WEB CONFERENCE, 1999, : 563 - 574
  • [30] A novel method for extracting information from web pages with multiple presentation templates
    Qingzhong L.
    Yanhui D.
    An F.
    Yongquan D.
    Journal of Software, 2010, 5 (05) : 506 - 513