Studying the impact of language-independent and language-specific features on hybrid Arabic Person name recognition

被引:1
|
作者
Oudah, Mai [1 ]
Shaalan, Khaled [2 ]
机构
[1] Masdar Inst Sci & Technol, Abu Dhabi, U Arab Emirates
[2] British Univ Dubai, Dubai Int Acad City, U Arab Emirates
关键词
Named entity recognition; Information extraction; Rule-based approach; Machine learning; Hybrid approach; Natural language processing; ENTITY RECOGNITION;
D O I
10.1007/s10579-016-9376-1
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, extensive experiments are conducted to study the impact of features of different categories, in isolation and gradually in an incremental manner, on Arabic Person name recognition. We present an integrated system that employs the rule-based approach with the machine learning (ML)-based approach in order to develop a consolidated hybrid system. Our feature space is comprised of language-independent and language-specific features. The explored features are naturally grouped under six categories: Person named entity tags predicted by the rule-based component, word-level features, POS features, morphological features, gazetteer features, and other contextual features. As decision tree algorithm has proved comparatively higher efficiency as a classifier in current state-of-the-art hybrid Named Entity Recognition for Arabic, it is adopted in this study as the ML technique utilized by the hybrid system. Therefore, the experiments are focused on two dimensions: the standard dataset used and the set of selected features. A number of standard datasets are used for the training and testing of the hybrid system, including ACE (2003-2004) and ANERcorp. The experimental analysis indicates that both language-independent and language-specific features play an important role in overcoming the challenges posed by Arabic language and have demonstrated critical impact on optimizing the performance of the hybrid system.
引用
收藏
页码:351 / 378
页数:28
相关论文
共 50 条
  • [21] The impact of language-specific categories on the development of classification behavior
    Lucy, JA
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 5623 - 5623
  • [22] Language-independent hyperparameter optimization based speech emotion recognition system
    Thakur A.
    Dhull S.K.
    International Journal of Information Technology, 2022, 14 (7) : 3691 - 3699
  • [23] Korean named entity recognition based on language-specific features (vol 30, 214, 2024)
    Chen, Yige
    Lim, Kyungtae
    Park, Jungyeul
    NATURAL LANGUAGE ENGINEERING, 2023,
  • [24] Language independent first and last name identification in person names
    Popescu, Octavian
    Magnini, Bernardo
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 322 - 333
  • [25] Expressing Love in English and Russian: Common and Language-Specific Features
    Kalyuga, Marika
    Harbus, Antonina
    SCANDO-SLAVICA, 2007, 53 (01) : 95 - 108
  • [26] NEUROPSYCHOLOGY OF READING - UNIVERSAL AND LANGUAGE-SPECIFIC FEATURES OF READING IMPAIRMENT
    SASANUMA, S
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1992, 27 (3-4) : 2 - 3
  • [27] Language-Independent Hearing Screening Based on Masked Recognition of Ecological Sounds
    Denys, Sam
    De Laat, Jan
    Dreschler, Wouter
    Hofmann, Michael
    van Wieringen, Astrid
    Wouters, Jan
    TRENDS IN HEARING, 2019, 23
  • [28] LHDiff: A Language-Independent Hybrid Approach for Tracking Source Code Lines
    Asaduzzaman, Muhammad
    Roy, Chanchal K.
    Schneider, Kevin A.
    Di Penta, Massimiliano
    2013 29TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), 2013, : 230 - 239
  • [29] A Language-Independent Hybrid Approach for Multi-Word Expression Extraction
    Liang, Yinghong
    Tan, Hongye
    Li, Hui
    Wang, Zhigang
    Gui, Wenming
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3273 - 3279
  • [30] Investigation of speech-based language-independent possibilities of depression recognition
    Kiss, Gabor
    2022 45TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING, TSP, 2022, : 226 - 229