Feature Transformations for Outlier Detection in Classification of Text Documents

被引:0
|
作者
Walkowiak, Tomasz [1 ]
机构
[1] Wroclaw Univ Sci & Technol, Fac Informat & Commun Technol, Wroclaw, Poland
关键词
D O I
10.1007/978-3-031-06746-4_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the influence of feature transformation on the results of outlier detection of text documents. We tested four outlier detection methods: Local Outlier Factor, Extreme Value Machine, Weibull-calibrated SVM, and the Mahalanobis distance. The analyzed text documents are represented by different feature vectors ranging from TF-IDF, through averaged word embedding (two types), to document embedding generated by the BERT network. Experimenting on two different text corpora, we show how a transformation of the feature space (vector representation of documents) influences the outlier detection results.
引用
收藏
页码:361 / 370
页数:10
相关论文
共 50 条
  • [41] Knowledge Supervised Text Classification with No Labeled Documents
    Zhang, Congle
    Xue, Gui-Rong
    Yu, Yong
    PRICAI 2008: TRENDS IN ARTIFICIAL INTELLIGENCE, 2008, 5351 : 509 - +
  • [42] Covariance Based Outlier Detection with Feature Selection
    Zwilling, Chris E.
    Wang, Michelle Y.
    2016 38TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2016, : 2606 - 2609
  • [43] Outlier Detection Ensemble with Embedded Feature Selection
    Cheng, Li
    Wang, Yijie
    Liu, Xinwang
    Li, Bin
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3503 - 3512
  • [44] Text Line Detection for Heterogeneous Documents
    Diem, Markus
    Kleber, Florian
    Sablatnig, Robert
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 743 - 747
  • [45] Detection of Plagiarism in Urdu Text Documents
    Ali, Waqar
    Ahmed, Tanveer
    Rehman, Zobia
    Rehman, Anwar Ur
    Slaman, Malik
    2018 14TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES (ICET), 2018,
  • [46] Text Reuse Detection in Handwritten Documents
    Grabovoy, A. V.
    Kaprielova, M. S.
    Kildyakov, A. S.
    Potyashin, I. O.
    Seyil, T. B.
    Finogeev, E. L.
    Chekhovich, Yu. V.
    DOKLADY MATHEMATICS, 2023, 108 (SUPPL 2) : S424 - S433
  • [47] Detection of Redacted Text in Legal Documents
    van Heusden, Ruben
    de Ruijter, Aron
    Majoor, Roderick
    Marx, Maarten
    LINKING THEORY AND PRACTICE OF DIGITAL LIBRARIES, TPDL 2023, 2023, 14241 : 310 - 316
  • [48] Text Reuse Detection in Handwritten Documents
    A. V. Grabovoy
    M. S. Kaprielova
    A. S. Kildyakov
    I. O. Potyashin
    T. B. Seyil
    E. L. Finogeev
    Yu. V. Chekhovich
    Doklady Mathematics, 2023, 108 : S424 - S433
  • [49] Text line detection in handwritten documents
    Louloudis, G.
    Gatos, B.
    Pratikakis, I.
    Halatsis, C.
    PATTERN RECOGNITION, 2008, 41 (12) : 3758 - 3772
  • [50] Evolutionary Feature Selection for Text Documents using the SVM
    Morariu, Daniel I.
    Vintan, Lucian N.
    Tresp, Volker
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 15, 2006, 15 : 215 - +