Feature Transformations for Outlier Detection in Classification of Text Documents

被引:0
|
作者
Walkowiak, Tomasz [1 ]
机构
[1] Wroclaw Univ Sci & Technol, Fac Informat & Commun Technol, Wroclaw, Poland
关键词
D O I
10.1007/978-3-031-06746-4_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the influence of feature transformation on the results of outlier detection of text documents. We tested four outlier detection methods: Local Outlier Factor, Extreme Value Machine, Weibull-calibrated SVM, and the Mahalanobis distance. The analyzed text documents are represented by different feature vectors ranging from TF-IDF, through averaged word embedding (two types), to document embedding generated by the BERT network. Experimenting on two different text corpora, we show how a transformation of the feature space (vector representation of documents) influences the outlier detection results.
引用
收藏
页码:361 / 370
页数:10
相关论文
共 50 条
  • [21] Text mining in the classification of digital documents
    Contreras Barrera, Marcial
    BIBLIOS-REVISTA DE BIBLIOTECOLOGIA Y CIENCIAS DE LA INFORMACION, 2016, (64): : 33 - 43
  • [22] A fuzzy approach to classification of text documents
    Liu, WY
    Song, N
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2003, 18 (05) : 640 - 647
  • [23] Complex approach to the text documents classification
    Tolcheev, V.O.
    Avtomatizatsiya i Sovremennye Tekhnologii, 2005, (08): : 39 - 45
  • [24] Functional outlier detection and taxonomy by sequential transformations
    Dai, Wenlin
    Mrkvicka, Tomas
    Sun, Ying
    Genton, Marc G.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 149
  • [25] Robust transformations and outlier detection with autocorrelated data
    Cerioli, A
    Riani, M
    FROM DATA AND INFORMATION ANALYSIS TO KNOWLEDGE ENGINEERING, 2006, : 262 - +
  • [26] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [27] Feature engineering for text classification
    Scott, S
    Matwin, S
    MACHINE LEARNING, PROCEEDINGS, 1999, : 379 - 388
  • [28] Distance Metrics in Open-Set Classification of Text Documents by Local Outlier Factor and Doc2Vec
    Walkowiak, Tomasz
    Datko, Szymon
    Maciejewski, Henryk
    ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE, 2019, 11606 : 102 - 109
  • [29] Toward text understanding - Classification of text documents by word map
    Visa, A
    Toivonen, J
    Back, B
    Vanharanta, H
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS, AND TECHNOLOGY II, 2000, 4057 : 299 - 305
  • [30] Sibylvariant Transformations for Robust Text Classification
    Harel-Canada, Fabrice
    Gulzar, Muhammad Ali
    Peng, Nanyun
    Kim, Miryung
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1771 - 1788