Combining feature selection and classifier ensemble using a multiobjective simulated annealing approach: application to named entity recognition

被引:17
|
作者
Ekbal, Asif [1 ]
Saha, Sriparna [1 ]
机构
[1] Indian Inst Technol, Dept Comp Sci & Engn, Patna, Bihar, India
关键词
Natural language processing; Named entity recognition; Maximum entropy (ME); Conditional random field (CRF); Support vector machine (SVM); Multiobjective optimization (MOO); Simulated annealing (SA); Classifier ensemble; Weighted voting; ALGORITHM; WEB;
D O I
10.1007/s00500-012-0885-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a two-stage multiobjective-simulated annealing (MOSA)-based technique for named entity recognition (NER). At first, MOSA is used for feature selection under two statistical classifiers, viz. conditional random field (CRF) and support vector machine (SVM). Each solution on the final Pareto optimal front provides a different classifier. These classifiers are then combined together by using a new classifier ensemble technique based on MOSA. Several different versions of the objective functions are exploited. We hypothesize that the reliability of prediction of each classifier differs among the various output classes. Thus, in an ensemble system, it is necessary to find out the appropriate weight of vote for each output class in each classifier. We propose a MOSA-based technique to determine the weights for votes automatically. The proposed two-stage technique is evaluated for NER in Bengali, a resource-poor language, as well as for English. Evaluation results yield the highest recall, precision and F-measure values of 93.95, 95.15 and 94.55 %, respectively for Bengali and 89.01, 89.35 and 89.18 %, respectively for English. Experiments also suggest that the classifier ensemble identified by the proposed MOO-based approach optimizing the F-measure values of named entity (NE) boundary detection outperforms all the individual classifiers and four conventional baseline models.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 50 条
  • [21] Boosting drug named entity recognition using an aggregate classifier
    Korkontzelos, Ioannis
    Piliouras, Dimitrios
    Dowsey, Andrew W.
    Ananiadou, Sophia
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2015, 65 (02) : 145 - 153
  • [22] Improving Biochemical Named Entity Recognition Using PSO Classifier Selection and Bayesian Combination Methods
    Akkasi, Abbas
    Varoglu, Ekrem
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (06) : 1327 - 1338
  • [23] Classifier ensemble for mammography CAD system combining feature selection with ensemble learning
    Nemoto, M
    Shimizu, A
    Kobatake, H
    Takeo, H
    Nawano, S
    CARS 2005: Computer Assisted Radiology and Surgery, 2005, 1281 : 1047 - 1051
  • [24] AN EFFICIENT FEATURE SELECTION METHOD USING NAMED ENTITY RECOGNITION FOR CHINESE TEXT CATEGORIZATION
    Liu, Bin
    Li, Chunping
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 3527 - +
  • [25] Chemical named entity recognition in the texts of scientific publications using the naive Bayes classifier approach
    Tarasova, O. A.
    Rudik, A., V
    Biziukova, N. Yu
    Filimonov, D. A.
    Poroikov, V. V.
    JOURNAL OF CHEMINFORMATICS, 2022, 14 (01)
  • [26] Feature selection techniques for maximum entropy based biomedical named entity recognition
    Saha, Sujan Kumar
    Sarkar, Sudeshna
    Mitra, Pabitra
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) : 905 - 911
  • [27] Chemical named entity recognition in the texts of scientific publications using the naïve Bayes classifier approach
    O. A. Tarasova
    A. V. Rudik
    N. Yu. Biziukova
    D. A. Filimonov
    V. V. Poroikov
    Journal of Cheminformatics, 14
  • [28] Combining Knowledge and CRF-Based Approach to Named Entity Recognition in Russian
    Mozharova, V. A.
    Loukachevitch, N. V.
    ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, AIST 2016, 2017, 661 : 185 - 195
  • [29] Simulated annealing based classifier ensemble techniques: Application to part of speech tagging
    Ekbal, Asif
    Saha, Sriparna
    INFORMATION FUSION, 2013, 14 (03) : 288 - 300
  • [30] Application of Genetic Algorithm and Simulated Annealing to Ensemble Classifier Training on Data Streams
    Jackowski, Konrad
    ADVANCES ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING (3PGCIC-2017), 2018, 13 : 266 - 276