News item extraction for text mining in web newspapers

被引:4
|
作者
Norvåg, K [1 ]
Oyri, R [1 ]
机构
[1] Norwegian Univ Sci & Technol, Dept Comp & Informat Sci, N-7491 Trondheim, Norway
关键词
D O I
10.1109/WIRI.2005.27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web newspapers provide a valuable resource for information. In order to benefit more from the available information, text mining techniques can be applied. However because each newspaper page often covers a lot of unrelated topics, page-based data mining will not always give useful results. In order to improve on complete-page mining, we present an approach based on extracting the individual news items from the web pages and mining these separately. Automatic news item extraction is a difficult problem, and in this paper we also provide strategies solving that task. We study the quality of the news item extraction, and also provide results from clustering the extracted news items.
引用
收藏
页码:195 / 204
页数:10
相关论文
共 50 条
  • [31] Mining Text Snippets for Images on the Web
    Kannan, Anitha
    Baker, Simon
    Ramnath, Krishnan
    Fiss, Juliet
    Lin, Dahua
    Vanderwende, Lucy
    PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1534 - 1543
  • [32] Text Mining: Sentiment Analysis on news classification
    Gomes, Helder
    Neto, Miguel de Castro
    Henriques, Roberto
    PROCEEDINGS OF THE 2013 8TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI 2013), 2013,
  • [33] Research on keyword extraction of Tibetan web news based on improved TEXT-RANK algorithm
    Lan, Chuanqi
    Yu, Hongzhi
    Xu, Tao
    Liu, Peixin
    Li, Jiuyi
    PROCEEDINGS OF 2017 IEEE 2ND INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2017, : 208 - 212
  • [34] Metadata Extraction using Text Mining
    Seth, Shivani
    Rueping, Stefan
    Wrobel, Stefan
    HEALTHGRID RESEARCH, INNOVATION AND BUSINESS CASE, 2009, 147 : 95 - 104
  • [35] Text mining via information extraction
    Feldman, R
    Aumann, Y
    Fresko, M
    Liphstat, O
    Rosenfeld, B
    Schler, Y
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 1704 : 165 - 173
  • [36] Association rule extraction for text mining
    Delgado, M
    Martín-Bautista, MJ
    Sánchez, D
    Serrano, JM
    Vila, MA
    FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS, 2002, 2522 : 154 - 162
  • [37] Detecting weak signals for long-term business opportunities using text mining of Web news
    Yoon, Janghyeok
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (16) : 12543 - 12550
  • [38] Automatic Keyword Extraction for Text Summarization in e-Newspapers
    Thomas, Justine Raju
    Bharti, Santosh Kumar
    Babu, Korra Sathya
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATICS AND ANALYTICS (ICIA' 16), 2016,
  • [39] Newspapers and the News
    Willey, Malcolm M.
    ANNALS OF THE AMERICAN ACADEMY OF POLITICAL AND SOCIAL SCIENCE, 1937, 194 : 230 - 231
  • [40] Newspapers and the News
    Hughes, Helen MacGill
    AMERICAN JOURNAL OF SOCIOLOGY, 1938, 43 (04) : 669 - 671