An Evolutionary Algorithm-Based Text Categorization Technique

被引:1
|
作者
Das, Ajit Kumar [1 ]
Das, Asit Kumar [1 ]
Sarkar, Apurba [1 ]
机构
[1] Indian Inst Engn Sci & Technol, Dept Comp Sci & Technol, Howrah 711103, W Bengal, India
关键词
Text mining; Feature selection; Text clustering; Cluster validation; Multi-objective evolutionary algorithm;
D O I
10.1007/978-981-10-8055-5_75
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In general, most of the organizations generate unstructured data from which extraction of meaningful information becomes a difficult task. Preprocessing of unstructured data before mining helps to improve the efficiency of the mining algorithms. In this paper, text data is initially preprocessed using tokenization, stop word removal, and stemming operations and a bag-of-words is identified to characterize the text dataset. Next, improved strength pareto evolutionary algorithm-based genetic algorithm is applied to determine the more compact set of informative words for clustering of text documents efficiently. It is a bi-objective genetic algorithm used to approximate the pareto-optimal front exploring the search space for optimal solution. The external clustering index and number of words described in the documents are considered as two objective functions of the algorithm, and based on these functions chromosomes in the population are evaluated and the best chromosome in non dominated pareto front of final population gives the optimal set of words sufficient for categorizartion of text dataset.
引用
收藏
页码:851 / 861
页数:11
相关论文
共 50 条
  • [41] A simple KNN algorithm for text categorization
    Soucy, P
    Mineau, GW
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 647 - 648
  • [42] A constructive learning algorithm for text categorization
    Chen, Weijun
    Zhang, Bo
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 2, PROCEEDINGS, 2006, 3972 : 259 - 264
  • [43] Using KNN Algorithm for Text Categorization
    Wajeed, M. A.
    Adilakshmi, T.
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 796 - +
  • [44] An Improved Parallel Algorithm for Text Categorization
    Yang, Wenchuan
    Fu, Yimin
    Zhang, Dong
    2016 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C), 2016, : 451 - 454
  • [45] Contextual Text Categorization: An Improved Stemming Algorithm to Increase the Quality of Categorization in Arabic Text
    Gadri, Said
    Moussaoui, Abdelouahab
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (06) : 835 - 841
  • [46] An adaptive genetic algorithm-based background elimination model for English text
    Tang, Xiaohui
    SOFT COMPUTING, 2022, 26 (16) : 8133 - 8143
  • [47] An adaptive genetic algorithm-based background elimination model for English text
    Tang Xiaohui
    Soft Computing, 2022, 26 : 8133 - 8143
  • [48] Lazy learner text categorization algorithm based on embedded feature selection
    Yan Peng
    Zheng Xuefeng
    Zhu Jianyong
    Xiao Yunhong
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2009, 20 (03) : 651 - 659
  • [49] Lazy learner text categorization algorithm based on embedded feature selection
    Yan Peng~(1
    2.China State Information Center
    Journal of Systems Engineering and Electronics, 2009, 20 (03) : 651 - 659
  • [50] FEATURE SELECTION IN EVOLUTIONARY ALGORITHM-BASED PARAMETER ESTIMATION OF DUFFING OSCILLATORS
    Banerjee, Amit
    Abu Mahfouz, Issam
    Abu-Ayyad, Ma'moun
    PROCEEDINGS OF THE ASME INTERNATIONAL MECHANICAL ENGINEERING CONGRESS AND EXPOSITION, 2013, VOL 4B, 2014,