An Evolutionary Algorithm-Based Text Categorization Technique

被引:1
|
作者
Das, Ajit Kumar [1 ]
Das, Asit Kumar [1 ]
Sarkar, Apurba [1 ]
机构
[1] Indian Inst Engn Sci & Technol, Dept Comp Sci & Technol, Howrah 711103, W Bengal, India
关键词
Text mining; Feature selection; Text clustering; Cluster validation; Multi-objective evolutionary algorithm;
D O I
10.1007/978-981-10-8055-5_75
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In general, most of the organizations generate unstructured data from which extraction of meaningful information becomes a difficult task. Preprocessing of unstructured data before mining helps to improve the efficiency of the mining algorithms. In this paper, text data is initially preprocessed using tokenization, stop word removal, and stemming operations and a bag-of-words is identified to characterize the text dataset. Next, improved strength pareto evolutionary algorithm-based genetic algorithm is applied to determine the more compact set of informative words for clustering of text documents efficiently. It is a bi-objective genetic algorithm used to approximate the pareto-optimal front exploring the search space for optimal solution. The external clustering index and number of words described in the documents are considered as two objective functions of the algorithm, and based on these functions chromosomes in the population are evaluated and the best chromosome in non dominated pareto front of final population gives the optimal set of words sufficient for categorizartion of text dataset.
引用
收藏
页码:851 / 861
页数:11
相关论文
共 50 条
  • [31] Text categorization algorithm based on feature order pair quantization
    Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
    Qinghua Daxue Xuebao, 2006, 4 (527-529+533):
  • [32] Evolutionary Algorithm-Based Error Parameterization Methods for Data Assimilation
    Bai, Yulong
    Li, Xin
    MONTHLY WEATHER REVIEW, 2011, 139 (08) : 2668 - 2685
  • [33] Evolutionary Algorithm-based Space Diversity for Imperfect Channel Estimation
    Ghadiri, Zienab Pouladmast
    El-Saleh, Ayman A.
    Vetharatnam, Gobi
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2014, 8 (05): : 1588 - 1603
  • [34] Evolutionary Algorithm-based Feature Selection for an Intrusion Detection System
    Singh, Devendra Kumar
    Shrivastava, Manish
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2021, 11 (03) : 7130 - 7134
  • [35] Evolutionary Algorithm-Based Background Generation for Robust Object Detection
    Kim, Taekyung
    Lee, Seongwon
    Paik, Joonki
    INTELLIGENT COMPUTING, PART I: INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, ICIC 2006, PART I, 2006, 4113 : 542 - 552
  • [36] Quantum-inspired evolutionary algorithm-based face verification
    Jang, JS
    Han, KH
    Kim, JH
    GENETIC AND EVOLUTIONARY COMPUTATION - GECCO 2003, PT II, PROCEEDINGS, 2003, 2724 : 2147 - 2156
  • [37] Evolutionary Algorithm-based Parameter Identification for Nonlinear Dynamical Systems
    Banerjee, Amit
    Abu-Mahfouz, Issam
    2011 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2011, : 1 - 5
  • [38] An Evolutionary Algorithm-Based PWM Strategy for a Hybrid Power Converter
    Rodriguez, Alma
    Alejo-Reyes, Avelina
    Cuevas, Erik
    Beltran-Carbajal, Francisco
    Rosas-Caro, Julio C.
    MATHEMATICS, 2020, 8 (08)
  • [39] Epigenetic Algorithm-Based Detection Technique for Network Attacks
    Ezzarii, Mehdi
    El Ghazi, Hamid
    El Ghazi, Hassan
    El Bouanani, Faissal
    IEEE ACCESS, 2020, 8 : 199482 - 199491
  • [40] A fast KNN algorithm for text categorization
    Wang, Yu
    Wang, Zheng-Ou
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3436 - +