News item extraction for text mining in web newspapers

被引:4
|
作者
Norvåg, K [1 ]
Oyri, R [1 ]
机构
[1] Norwegian Univ Sci & Technol, Dept Comp & Informat Sci, N-7491 Trondheim, Norway
关键词
D O I
10.1109/WIRI.2005.27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web newspapers provide a valuable resource for information. In order to benefit more from the available information, text mining techniques can be applied. However because each newspaper page often covers a lot of unrelated topics, page-based data mining will not always give useful results. In order to improve on complete-page mining, we present an approach based on extracting the individual news items from the web pages and mining these separately. Automatic news item extraction is a difficult problem, and in this paper we also provide strategies solving that task. We study the quality of the news item extraction, and also provide results from clustering the extracted news items.
引用
收藏
页码:195 / 204
页数:10
相关论文
共 50 条
  • [41] NEWSPAPERS AND THE NEWS
    不详
    SOCIOLOGY AND SOCIAL RESEARCH, 1937, 22 (01): : 78 - 78
  • [42] A Generic Web News Extraction Approach
    Dong, Yongquan
    Li, Qingzhong
    Yan, Zhongmin
    Ding, Yanhui
    2008 INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, VOLS 1-4, 2008, : 179 - 183
  • [43] NEWSPAPERS AND THE NEWS
    Moore, Harry Estill
    SOCIAL FORCES, 1939, 17 (03) : 440 - 443
  • [44] NEWSPAPERS AND THE NEWS
    Merwin, Fred E.
    JOURNALISM QUARTERLY, 1937, 14 (03): : 280 - 281
  • [45] News Recommendation Based on Web Usage and Web Content Mining
    Husin, Husna Sarirah
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2013, : 326 - 329
  • [46] Newspapers and the News
    不详
    JOURNAL OF EDUCATIONAL SOCIOLOGY, 1937, 11 (01): : 61 - 62
  • [47] Keyword extraction strategy for item banks text categorization
    Nuntiyagul, Atorn
    Naruedomkul, Kanlaya
    Cercone, Nick
    Wongsawang, Damras
    COMPUTATIONAL INTELLIGENCE, 2007, 23 (01) : 28 - 44
  • [48] Network text analysis of medical tourism in newspapers using text mining: The South Korea case
    Kim, Sohyeon
    Lee, Won Seok
    TOURISM MANAGEMENT PERSPECTIVES, 2019, 31 : 332 - 339
  • [49] Text Extraction from Web Images
    Liu, Changsong
    Yang, Cheng
    Ding, Xiaoqing
    Fan, Jian
    IMAGING AND PRINTING IN A WEB 2.0 WORLD II, 2011, 7879
  • [50] News mining and fusion in web big data
    Hubei MinZu University, College of Literature and Communication, Enshi, Hubei Province, China
    Tan, X. (tanx@china.com.cn), 1600, Sila Science, University Mah Mekan Sok, No 24, Trabzon, Turkey (32):