Efficient Online Novelty Detection in News Streams

被引:0
|
作者
Karkali, Margarita [1 ]
Rousseau, Francois [2 ]
Ntoulas, Alexandros [3 ,4 ]
Vazirgiannis, Michalis [1 ,2 ,5 ]
机构
[1] Athens Univ Econ & Business, Athens, Greece
[2] LIX, Ecole Polytech, Palaiseau, France
[3] Natl & Kapodistrian Univ Athens, Athens, Greece
[4] Zynga, San Francisco, CA 94103 USA
[5] Telecom Paris, Inst Mines Telecom, Paris, France
关键词
novelty detection; inverse document frequency; news streams;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Novelty detection in text streams is a challenging task that emerges in quite a few different scenarii, ranging from email threads to RSS news feeds on a cell phone. An efficient novelty detection algorithm can save the user a great deal of time when accessing interesting information. Most of the recent research for the detection of novel documents in text streams uses either geometric distances or distributional similarities with the former typically performing better but being slower as we need to compare an incoming document with all the previously seen ones. In this paper, we propose a new novelty detection algorithm based on the Inverse Document Frequency (IDF) scoring function. Computing novelty based on IDF enables us to avoid similarity comparisons with previous documents in the text stream, thus leading to faster execution times. At the same time, our proposed approach outperforms several commonly used baselines when applied on a real-world news articles dataset.
引用
收藏
页码:57 / 71
页数:15
相关论文
共 50 条
  • [1] Novelty Detection and Online Learning for Chunk Data Streams
    Wang, Yi
    Ding, Yi
    He, Xiangjian
    Fan, Xin
    Lin, Chi
    Li, Fengqi
    Wang, Tianzhu
    Luo, Zhongxuan
    Luo, Jiebo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (07) : 2400 - 2412
  • [2] Adaptive online event detection in news streams
    Hu, Linmei
    Zhang, Bin
    Hou, Lei
    Li, Juanzi
    KNOWLEDGE-BASED SYSTEMS, 2017, 138 : 105 - 112
  • [3] Online Clustering for Novelty Detection and Concept Drift in Data Streams
    Garcia, Kemilly Dearo
    Poel, Mannes
    Kok, Joost N.
    de Carvalho, Andre C. P. L. F.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PT II, 2019, 11805 : 448 - 459
  • [5] Online detection of bursty events and their evolution in news streams
    Wei CHENChun CHENLijun ZHANGCan WANGJiajun BU Zhejiang Laboratory of Service RobotZhejiang UniversityHangzhou China
    Journal of Zhejiang University-Science C(Computer & Electronics), 2010, 11 (05) : 340 - 355
  • [6] Online detection of bursty events and their evolution in news streams
    Wei Chen
    Chun Chen
    Li-jun Zhang
    Can Wang
    Jia-jun Bu
    Journal of Zhejiang University SCIENCE C, 2010, 11 : 340 - 355
  • [7] Online detection of bursty events and their evolution in news streams
    Chen, Wei
    Chen, Chun
    Zhang, Li-jun
    Wang, Can
    Bu, Jia-jun
    JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2010, 11 (05): : 340 - 355
  • [8] Novelty detection in data streams
    Faria, Elaine R.
    Goncalves, Isabel J. C. R.
    de Carvalho, Andre C. P. L. F.
    Gama, Joao
    ARTIFICIAL INTELLIGENCE REVIEW, 2016, 45 (02) : 235 - 269
  • [9] Novelty detection in data streams
    Elaine R. Faria
    Isabel J. C. R. Gonçalves
    André C. P. L. F. de Carvalho
    João Gama
    Artificial Intelligence Review, 2016, 45 : 235 - 269
  • [10] Novelty detection with application to data streams
    Spinosa, Eduardo J.
    de Carvalho, Andre Ponce de Leon F.
    Gama, Joao
    INTELLIGENT DATA ANALYSIS, 2009, 13 (03) : 405 - 422