News item extraction for text mining in web newspapers

被引:4
|
作者
Norvåg, K [1 ]
Oyri, R [1 ]
机构
[1] Norwegian Univ Sci & Technol, Dept Comp & Informat Sci, N-7491 Trondheim, Norway
关键词
D O I
10.1109/WIRI.2005.27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web newspapers provide a valuable resource for information. In order to benefit more from the available information, text mining techniques can be applied. However because each newspaper page often covers a lot of unrelated topics, page-based data mining will not always give useful results. In order to improve on complete-page mining, we present an approach based on extracting the individual news items from the web pages and mining these separately. Automatic news item extraction is a difficult problem, and in this paper we also provide strategies solving that task. We study the quality of the news item extraction, and also provide results from clustering the extracted news items.
引用
收藏
页码:195 / 204
页数:10
相关论文
共 50 条
  • [1] Mining on Terms Extraction from Web News
    Hsu, Li-Fu
    COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT I, 2010, 6421 : 188 - 194
  • [2] The feature extraction of text mining based on Web
    Liu, LZ
    Chen, JJ
    Song, HT
    ICEMI'2003: PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON ELECTRONIC MEASUREMENT & INSTRUMENTS, VOLS 1-3, 2003, : 547 - 550
  • [3] Research and realization of extraction algorithm on web text mining
    Yin, Shiqun
    Qu, Yuhui
    Ge, Jike
    Lan, Xiaohong
    IITA 2007: WORKSHOP ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, PROCEEDINGS, 2007, : 278 - +
  • [4] Extraction of news content for text mining based on edit distance
    Lan, Qiujun
    Journal of Computational Information Systems, 2010, 6 (11): : 3761 - 3778
  • [5] Title-Based Extraction of News Contents for Text Mining
    Tan, Zhen
    He, Chunhui
    Fang, Yang
    Ge, Bin
    Xiao, Weidong
    IEEE ACCESS, 2018, 6 : 64085 - 64095
  • [6] Web News Data Extraction Technology Based on Text Keywords
    Zhang, Kun
    COMPLEXITY, 2021, 2021
  • [7] Unsupervised learning of mDTD extraction patterns for Web text mining
    Kim, D
    Jung, HM
    Lee, GG
    INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (04) : 623 - 637
  • [8] INFORMATION EXTRACTION VERSUS TEXT SEGMENTATION FOR WEB CONTENT MINING
    Fragkou, Pavlina
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2013, 23 (08) : 1109 - 1137
  • [9] Extremely local news: Community newspapers on the Web
    Marcus, J
    DATABASE, 1998, 21 (02): : 73 - 75
  • [10] Automatic Text Extraction from Arabic Newspapers
    Vasilopoulos, Nikos
    Wasfi, Yazan
    Kavallieratou, Ergina
    IMAGE ANALYSIS AND RECOGNITION (ICIAR 2018), 2018, 10882 : 505 - 510