News item extraction for text mining in web newspapers

被引:4
|
作者
Norvåg, K [1 ]
Oyri, R [1 ]
机构
[1] Norwegian Univ Sci & Technol, Dept Comp & Informat Sci, N-7491 Trondheim, Norway
关键词
D O I
10.1109/WIRI.2005.27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web newspapers provide a valuable resource for information. In order to benefit more from the available information, text mining techniques can be applied. However because each newspaper page often covers a lot of unrelated topics, page-based data mining will not always give useful results. In order to improve on complete-page mining, we present an approach based on extracting the individual news items from the web pages and mining these separately. Automatic news item extraction is a difficult problem, and in this paper we also provide strategies solving that task. We study the quality of the news item extraction, and also provide results from clustering the extracted news items.
引用
收藏
页码:195 / 204
页数:10
相关论文
共 50 条
  • [21] A novel text mining approach for scholar information extraction from web content in Chinese
    Xie, Xia
    Fu, Yu
    Jin, Hai
    Zhao, Yaliang
    Cao, Wenzhi
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 111 : 859 - 872
  • [22] Extending Web Mining to Digital Forensics Text Mining
    Hicks, Chelsea
    Beebe, Nicole Lang
    Haliscak, Brandi
    AMCIS 2016 PROCEEDINGS, 2016,
  • [23] Automatic Extraction of Patterns in Digital News Articles of Femicides occurred in Mexico by Text Mining Techniques
    Zarate-Cartas, Jonathan
    Molina-Villegas, Alejandro
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 1204 - 1205
  • [24] Applying passage in Web text mining
    Theeramunkong, T
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2004, 19 (1-2) : 149 - 158
  • [25] DATA PREPROCESSING IN WEB TEXT MINING
    Jiang Yongbo
    FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING (ICACTE 2012), 2012, : 573 - 581
  • [26] A Parallel Platform for Web Text Mining
    Ping Lu
    Zhenjiang Dong
    Shengmei Luo
    Lixia Liu
    Shanshan Guan
    Shengyu Liu
    Qingcai Chen
    ZTE Communications, 2013, 11 (03) : 56 - 61
  • [27] Guest Editorial: Text and Web Mining
    Ah-Hwee Tan
    Philip S. Yu
    Applied Intelligence, 2003, 18 : 239 - 241
  • [28] Guest editorial: Text and web mining
    Tan, AH
    Yu, PS
    APPLIED INTELLIGENCE, 2003, 18 (03) : 239 - 241
  • [29] A Web Text Mining Flexible Architecture
    Castellano, M.
    Mastronardi, G.
    Aprile, A.
    Tarricone, G.
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 26, PARTS 1 AND 2, DECEMBER 2007, 2007, 26 : 78 - +
  • [30] ALL THE NEWS FIT TO POST? COMPARING NEWS CONTENT ON THE WEB TO NEWSPAPERS, TELEVISION, AND RADIO
    Maier, Scott
    JOURNALISM & MASS COMMUNICATION QUARTERLY, 2010, 87 (3-4) : 548 - 562