Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders

被引:9
|
作者
Alkhatib, Wael [1 ]
Rensing, Christoph [1 ]
Silberbauer, Johannes [1 ]
机构
[1] Tech Univ Darmstadt, Fachgebiet Multimedia Kommunikat, S3-20,Rundeturmstr 10, D-64283 Darmstadt, Germany
来源
关键词
Semantics; Feature selection; Dimensionality reduction; Text classification; Semantic relations; Autoencoders; FEATURE-SELECTION;
D O I
10.1007/978-3-319-59888-8_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is of vital concern in text classification to reduce the high dimensionality of feature space. The wide range of statistical techniques which have been proposed for weighting and selecting features suffer from loss of semantic relationship among concepts and ignoring of dependencies and ordering between adjacent words. In this work we propose two techniques for incorporating semantics in feature selection. Furthermore, we use autoencoders to transform the features into a reduced feature space in order to analyse the performance penalty of feature extraction. Our intensive experiments, using the EUR-lex dataset, showed that semantic-based feature selection techniques significantly outperform the Bag-of-Word (BOW) frequency based feature selection method with term frequency/inverse document frequency (TF-IDF) for features weighting. In addition, after an aggressive dimensionality reduction of original features with a factor of 10, the autoencoders are still capable of producing better features compared to BOW with TF-IDF.
引用
收藏
页码:380 / 394
页数:15
相关论文
共 50 条
  • [41] MULTI-LABEL TEXT CLASSIFICATION WITH A ROBUST LABEL DEPENDENT REPRESENTATION
    Alfaro, Rodrigo
    Allende, Hector
    2011 INTERNATIONAL CONFERENCE ON INSTRUMENTATION, MEASUREMENT, CIRCUITS AND SYSTEMS (ICIMCS 2011), VOL 3: COMPUTER-AIDED DESIGN, MANUFACTURING AND MANAGEMENT, 2011, : 211 - 214
  • [42] Multi-Label Emotion Classification of Online Learners' Reviews Using Machine Learning Text-Based Multi-Label Classification Approach
    Makhoukhi, Hajar
    Roubi, Sarra
    2024 5TH INTERNATIONAL CONFERENCE ON EDUCATION DEVELOPMENT AND STUDIES, ICEDS 2024, 2024, : 59 - 64
  • [43] Latent Semantic Indexing and Convolutional Neural Network for Multi-Label and Multi-Class Text Classification
    Quispe, Oscar
    Ocsa, Alexander
    Coronado, Ricardo
    2017 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2017,
  • [44] Learning Video Features for Multi-label Classification
    Garg, Shivam
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 325 - 337
  • [45] Research on Multi-Classification and Multi-Label in Text Categorization
    Hua, Liu
    2009 INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS, VOL 2, PROCEEDINGS, 2009, : 86 - 89
  • [46] Noisy multi-label semi-supervised dimensionality reduction
    Mikalsen, Karl Oyvind
    Soguero-Ruiz, Cristina
    Bianchi, Filippo Maria
    Jenssen, Robert
    PATTERN RECOGNITION, 2019, 90 : 257 - 270
  • [47] An Improved ML-kNN Multi-label Classification Model Based on Feature Dimensionality Reduction
    Li, Zhi-qiang
    Cao, Shuai-yi
    Guo, Hong-chen
    INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS AND ELECTRONIC ENGINEERING (CMEE 2016), 2016,
  • [48] Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms
    Bromuri, Stefano
    Zufferey, Damien
    Hennebert, Jean
    Schumacher, Michael
    JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 51 : 165 - 175
  • [49] Multi-label text classification with an ensemble feature space
    Tandon, Kushagri
    Chatterjee, Niladri
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4425 - 4436
  • [50] Multi-label Classification with Clustering for Image and Text Categorization
    Nasierding, Gulisong
    Sajjanhar, Atul
    2013 6TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), VOLS 1-3, 2013, : 869 - 874