A Combined Approach for Multi-Label Text Data Classification

被引:1
|
作者
Strimaitis, Rokas [1 ]
Stefanovic, Pavel [1 ]
Ramanauskaite, Simona [2 ]
Slotkiene, Asta [1 ]
机构
[1] Vilnius Gediminas Tech Univ, Dept Informat Syst, Sauletekio Al 11, LT-10223 Vilnius, Lithuania
[2] Vilnius Gediminas Tech Univ, Dept Informat Technol, Sauletekio Al 11, LT-10223 Vilnius, Lithuania
关键词
Analysis solution - Automated data analysis - Data classification - Data items - Multi-labels - Multilabel - Multinomial naive bayes - Similarity measure - Text analysis - Text data;
D O I
10.1155/2022/3369703
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automated data analysis solutions are very dependent on data and its quality. The possibility of assigning more than one class to the same data item is one of the specificities that need to be taken into account. There are no solutions, dedicated to Lithuanian text data classification that helps to assign more than one class to data item. In this paper, a new combined approach has been proposed for multilabel text data classification for text analysis. The main aim of the proposed approach is to improve the accuracy of traditional classification algorithms by incorporating the results obtained using similarity measures. The experimental investigation has been performed using the financial news multilabel text data in the Lithuanian language. Data have been collected from four public websites and classified by experts into ten classes manually, where each of the data items has no more than two classes. The results of five commonly used algorithms have been compared for dataset classification: the support vector machine, multinomial naive Bayes, k-nearest neighbours, decision trees, linear and discriminant analysis. In addition, two similarity measures have been compared: the cosine distance and the dice coefficient. Research has shown that the best results have been obtained using the cosine similarity distance and the multinomial naive Bayes classifier. The proposed approach combines the results of these two methods. Research on different cases of the proposed approach indicated the peculiarities of its application. At the same time, the combined approach allowed us to obtain a statistically significant increase in global accuracy.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Multi-label Text Classification with Deep Neural Networks
    Chen, Yun
    Xiao, Bo
    Lin, Zhiqing
    Dai, Cheng
    Li, Zuochao
    Yang, Liping
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 409 - 413
  • [42] Hierarchical Multi-label Classification of Text with Capsule Networks
    Aly, Rami
    Remus, Steffen
    Biemann, Chris
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 323 - 330
  • [43] Correlation Networks for Extreme Multi-label Text Classification
    Xun, Guangxu
    Jha, Kishlay
    Sun, Jianhui
    Zhang, Aidong
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1074 - 1082
  • [44] Multi-label dataless text classification with topic modeling
    Daochen Zha
    Chenliang Li
    Knowledge and Information Systems, 2019, 61 : 137 - 160
  • [45] A novel reasoning mechanism for multi-label text classification
    Wang, Ran
    Ridley, Robert
    Su, Xi'ao
    Qu, Weiguang
    Dai, Xinyu
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (02)
  • [46] Academic Resource Text Hierarchical Multi-Label Classification
    Wang, Yue
    Li, Yawen
    Li, Ang
    Computer Engineering and Applications, 2023, 59 (13): : 92 - 98
  • [47] Effective Multi-Label Active Learning for Text Classification
    Yang, Bishan
    Sun, Jian-Tao
    Wang, Tengjiao
    Chen, Zheng
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 917 - 925
  • [48] Hierarchical Transfer Learning for Multi-label Text Classification
    Banerjee, Siddhartha
    Akkaya, Cem
    Perez-Sorrosal, Francisco
    Tsioutsiouliklis, Kostas
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6295 - 6300
  • [49] Multi-label dataless text classification with topic modeling
    Zha, Daochen
    Li, Chenliang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 137 - 160
  • [50] A NEW INPUT REPRESENTATION FOR MULTI-LABEL TEXT CLASSIFICATION
    Alfaro, Rodrigo
    Allende, Hector
    2011 INTERNATIONAL CONFERENCE ON INSTRUMENTATION, MEASUREMENT, CIRCUITS AND SYSTEMS (ICIMCS 2011), VOL 3: COMPUTER-AIDED DESIGN, MANUFACTURING AND MANAGEMENT, 2011, : 207 - 210