Minimal feature set in language identification and finding suitable classification method with it

被引:4
|
作者
Takci, Hidayet [1 ]
Ekinci, Ekin [1 ]
机构
[1] Gebze Inst Technol, Fac Engn, Dept Comp Engn, TR-41400 Gebze, Turkey
关键词
language identification; feature based methods; letter features; weighting factor; classification algorithms;
D O I
10.1016/j.protcy.2012.02.099
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Language identification (LI) is a phase of natural language processing. Although LI is formerly studied, there is still much work to do for better performance. The purpose of this study is to present low dimensional feature set which is built from letters and diacritics and suitable classification algorithm (C-SVC, MLP or LDA) with it for high performance. In addition, a weight factor has been integrated to language identification system for increasing the performance. Experiments have been done on ECI corpus. Weight factor has increased the classification accuracies. The most accurate and the fastest method is C-SVC for our feature set. (C) 2011 Published by Elsevier Ltd.
引用
收藏
页码:444 / 448
页数:5
相关论文
共 50 条
  • [11] Lithology Classification Based on Set-Valued Identification Method
    Li Jing
    Wu Lifang
    Lu Wenjun
    Wang Ting
    Kang Yu
    Feng Deyong
    Zhou Hansheng
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2022, 35 (05) : 1637 - 1652
  • [12] Lithology Classification Based on Set-Valued Identification Method
    Jing Li
    Lifang Wu
    Wenjun Lü
    Ting Wang
    Yu Kang
    Deyong Feng
    Hansheng Zhou
    Journal of Systems Science and Complexity, 2022, 35 : 1637 - 1652
  • [13] Typical Feature Classification and Identification Method Based on Hyperspectral Data
    Xu Da
    Pan Jun
    Jiang Lijun
    Cao Yu
    LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (15)
  • [14] Graph based feature selection investigating boundary region of rough set for language identification
    Yasmin, Ghazaala
    Das, Asit Kumar
    Nayak, Janmenjoy
    Pelusi, Danilo
    Ding, Weiping
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 158
  • [15] AN ALGORITHM FOR FINDING A NEARLY MINIMAL BALANCED SET IN Fp
    Nedev, Zhivko
    MATHEMATICS OF COMPUTATION, 2009, 78 (268) : 2259 - 2267
  • [16] Finding the minimal set of maximum disks for binary objects
    Nilsson, F
    Danielsson, PE
    GRAPHICAL MODELS AND IMAGE PROCESSING, 1997, 59 (01): : 55 - 60
  • [17] Feature set identification for detecting suspicious URLs using Bayesian classification in social networks
    Chen, Chia-Mei
    Guan, D. J.
    Su, Qun-Kai
    INFORMATION SCIENCES, 2014, 289 : 133 - 147
  • [18] Identification of Comprehensive Energy Consumption Feature Based on Rough Set Bayesian Classification Algorithm
    Liu, Jian
    Ouyang, Cengkai
    Zhao, Shuangshuang
    Tian, Zhengqi
    Wang, Lihui
    5TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI2020), 2020, 1575
  • [19] Aim identification with a minimal parameter set
    Ravindra, Vishal C.
    Bar-Shalom, Yaakov
    Gottesman, Stephen
    2007 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2007, : 1805 - 1814
  • [20] RESEARCH ON IDENTIFICATION AND CLASSIFICATION METHOD OF IMBALANCED DATA SET OF PIG BEHAVIOR
    Jin, Min
    Yang, Bowen
    Wang, Chunguang
    ENGENHARIA AGRICOLA, 2023, 43 (02):