Text Classification Algorithms: A Survey

被引:738
|
作者
Kowsari, Kamran [1 ,2 ]
Meimandi, Kiana Jafari [1 ]
Heidarysafa, Mojtaba [1 ]
Mendu, Sanjana [1 ]
Barnes, Laura [1 ,2 ,3 ]
Brown, Donald [1 ,3 ]
机构
[1] Univ Virginia, Dept Syst & Informat Engn, Charlottesville, VA 22904 USA
[2] Univ Virginia, Sensing Syst Hlth Lab, Charlottesville, VA 22911 USA
[3] Univ Virginia, Sch Data Sci, Charlottesville, VA 22904 USA
关键词
text classification; text mining; text representation; text categorization; text analysis; document classification; ROC CURVE; DIMENSIONALITY REDUCTION; LOGISTIC-REGRESSION; COMPONENT ANALYSIS; NEURAL-NETWORK; BAYES THEOREM; NAIVE BAYES; AREA; MODELS; TREE;
D O I
10.3390/info10040150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in real-world problems are discussed.
引用
收藏
页数:68
相关论文
共 50 条
  • [31] Preferential text classification: learning algorithms and evaluation measures
    Fabio Aiolli
    Riccardo Cardin
    Fabrizio Sebastiani
    Alessandro Sperduti
    Information Retrieval, 2009, 12 : 559 - 580
  • [32] Text classification using Web corpora and EM algorithms
    Hung, CM
    Chien, LF
    INFORMATION RETRIEVAL TECHNOLOGY, 2005, 3411 : 12 - 23
  • [33] Preferential text classification: learning algorithms and evaluation measures
    Aiolli, Fabio
    Cardin, Riccardo
    Sebastiani, Fabrizio
    Sperduti, Alessandro
    INFORMATION RETRIEVAL, 2009, 12 (05): : 559 - 580
  • [34] Text mining: A survey of Arabic root extraction algorithms
    Hamza, Manar Ahmed Mohammed
    Ahmed, Tarig Mohamed
    Hilal, Anwer Mustafa Mohamedsalih
    INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2021, 8 (01): : 11 - 19
  • [35] Short Text Clustering Algorithms, Application and Challenges: A Survey
    Ahmed, Majid Hameed
    Tiun, Sabrina
    Omar, Nazlia
    Sani, Nor Samsiah
    APPLIED SCIENCES-BASEL, 2023, 13 (01):
  • [36] A Survey of Brain Tumor Segmentation and Classification Algorithms
    Biratu, Erena Siyoum
    Schwenker, Friedhelm
    Ayano, Yehualashet Megersa
    Debelee, Taye Girma
    JOURNAL OF IMAGING, 2021, 7 (09)
  • [37] A survey on sentiment classification algorithms, challenges and applications
    Rana, Muhammad Rizwan Rashid
    Nawaz, Asif
    Iqbal, Javed
    ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2018, 10 (01) : 58 - 72
  • [38] A Survey and Taxonomy of various Packet Classification Algorithms
    Nagpal, Bharti
    Chauhan, Naresh
    Singh, Nanhay
    Murari, Radhika
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND APPLICATIONS (ICACEA), 2015, : 8 - +
  • [39] Semantic text classification: A survey of past and recent advances
    Altinel, Berna
    Ganiz, Murat Can
    INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 1129 - 1153
  • [40] A Survey on Text Classification Techniques for Sentiment Polarity Detection
    Arunachalam, N.
    Sneka, Josephine S.
    MadhuMathi, G.
    2017 INNOVATIONS IN POWER AND ADVANCED COMPUTING TECHNOLOGIES (I-PACT), 2017,