An approach for outlier and novelty detection for text data based on classifier confidence

被引:1
|
作者
Pizurica, Nikola [1 ]
Tomovic, Savo [1 ]
机构
[1] Univ Montenegro, Fac Math & Nat Sci, Cetinjska 2, Podgorica 81000, Montenegro
关键词
Classification; novelty detection; outlier detection; classifier confidence; information retrieval;
D O I
10.3233/AIC-200649
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present an approach for novelty detection in text data. The approach can also be considered as semi-supervised anomaly detection because it operates with the training dataset containing labelled instances for the known classes only. During the training phase the classification model is learned. It is assumed that at least two known classes exist in the available training dataset. In the testing phase instances are classified as normal or anomalous based on the classifier confidence. In other words, if the classifier cannot assign any of the known class labels to the given instance with sufficiently high confidence (probability), the instance will be declared as novelty (anomaly). We propose two procedures to objectively measure the classifier confidence. Experimental results show that the proposed approach is comparable to methods known in the literature.
引用
收藏
页码:139 / 153
页数:15
相关论文
共 50 条
  • [1] A Model-based Approach for Text Clustering with Outlier Detection
    Yin, Jianhua
    Wang, Jianyong
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 625 - 636
  • [2] Application of LVQ to novelty detection using outlier training data
    Lee, Hyoung-joo
    Cho, Sungzoon
    PATTERN RECOGNITION LETTERS, 2006, 27 (13) : 1572 - 1579
  • [3] Fuzzy clustering-based semi-supervised approach for outlier detection in big text data
    Farek Lazhar
    Progress in Artificial Intelligence, 2019, 8 : 123 - 132
  • [4] Fuzzy clustering-based semi-supervised approach for outlier detection in big text data
    Lazhar, Farek
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2019, 8 (01) : 123 - 132
  • [5] Text detection approach based on confidence map and context information
    Wang, Runmin
    Sang, Nong
    Gao, Changxin
    NEUROCOMPUTING, 2015, 157 : 153 - 165
  • [6] Clustering ensemble-based novelty score for outlier detection
    Yu, Jaehong
    Kang, Jihoon
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 121
  • [7] EXPLORING CONFIDENCE-BASED NEIGHBORHOODS IN OUTLIER DETECTION
    Fu, Juihsi
    Lee, Singling
    Wu, Chiawen
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 81 - 86
  • [8] An Explainable Outlier Detection-based Data Cleaning Approach for Intrusion Detection
    Ha, Theodore
    Shao, Sicong
    Hariri, Salim
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [9] Outlier detection based on confidence band and extreme value theory for semi-supervised learning of an incremental polynomial classifier
    Al-Behadili H.
    Grumpe A.
    Wöhler C.
    International Journal of Simulation: Systems, Science and Technology, 2016, 17 (34): : 15.1 - 15.7
  • [10] An Effective Pattern Based Outlier Detection Approach for Mixed Attribute Data
    Zhang, Ke
    Jin, Huidong
    AI 2010: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2010, 6464 : 122 - 131