Supervised Named Entity Recognition in Assamese language

被引:0
|
作者
Talukdar, Gitimoni [1 ]
Borah, Pranjal Protim [1 ]
Baruah, Arup [1 ]
机构
[1] Assam Don Bosco Univ, Dept Comp Sci & Engn & IT, Gauhati, India
关键词
Named Entity Recognition; Corpus; Naive Bayes Classifier; Morphology; Suffix stripping;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In each and every natural language nouns play a very important role. A subcategory of noun is proper noun. They represent the names of person, location, organization etc. The task of recognizing the proper nouns in a text and categorizing them into some classes such as person, location, organization and other is called Named Entity Recognition. This is a very essential step of many natural language processing applications that makes the process of information extraction easier. Named Entity Recognition (NER) in most of the Indian languages has been performed using rule-based, supervised and unsupervised approaches. In this work our target language is Assamese, the language spoken by most of the people in North-Eastern part of India and particularly in Assam. In Assamese language, Named Entity Recognition has been performed using the rule based and suffix stripping based approaches. Supervised learning technique is more useful and can be easily adapted to new domains compared to rule based approaches. This paper reports the first work in Assamese NER using a machine learning technique. In this paper Assamese Named Entity Recognition is performed using Naive Bayes classifier. Since feature extraction plays the most important role in getting better performance in any machine learning technique, in this work our aim is to put forward a description of a few important features related to Assamese NER and performance measure of the system using these features.
引用
收藏
页码:187 / 191
页数:5
相关论文
共 50 条
  • [1] Named Entity Recognition in Assamese: A Hybrid Approach
    Sharma, Padmaja
    Sharma, Utpal
    Kalita, Jugal
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2114 - 2120
  • [2] Named Entity Recognition In Assamese using CRFs and Rules
    Sharma, Padmaja
    Sharma, Utpal
    Kalita, Jugal
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 15 - 18
  • [3] AsNER - Annotated Dataset and Baseline for Assamese Named Entity recognition
    Pathak, Dhrubajyoti
    Nandi, Sukumar
    Sarmah, Priyankoo
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6571 - 6577
  • [4] Named Entity Recognition for Mongolian Language
    Munkhjargal, Zoljargal
    Bella, Gabor
    Chagnaa, Altangerel
    Giunchiglia, Fausto
    TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 243 - 251
  • [5] Named Entity Recognition in Marathi Language
    Kale, Shrutika
    Govilkar, Sharvari
    INTERNATIONAL CONFERENCE ON INTELLIGENT DATA COMMUNICATION TECHNOLOGIES AND INTERNET OF THINGS, ICICI 2018, 2019, 26 : 371 - 377
  • [6] Named Entity Recognition for Nepali Language
    Singh, Oyesh Mann
    Padia, Ankur
    Joshi, Anupam
    2019 IEEE 5TH INTERNATIONAL CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (CIC 2019), 2019, : 184 - 190
  • [7] Named entity recognition for the Kazakh language
    Kozhirbayev, Z. M.
    Yessenbayev, Z. A.
    JOURNAL OF MATHEMATICS MECHANICS AND COMPUTER SCIENCE, 2020, 107 (03): : 57 - 66
  • [8] Named Entity Recognition for Sinhala Language
    Dahanayaka, J. K.
    Weerasinghe, A. R.
    14TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER) 2014, 2014, : 215 - 220
  • [9] Named Entity Recognition for the Azerbaijani Language
    Akhundova, Natavan
    2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
  • [10] Noise Detection for Distant Supervised Named Entity Recognition
    Wang J.
    Wang K.
    Wang H.
    Du W.
    He Z.
    Ruan T.
    Liu J.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (04): : 916 - 928