Named Entity Recognition and Classification for Punjabi Shahmukhi

被引:14
|
作者
Ahmad, Muhammad Tayyab [1 ,2 ]
Malik, Muhammad Kamran [1 ,2 ]
Shahzad, Khurram [1 ,2 ]
Aslam, Faisal [1 ,2 ]
Iqbal, Asif [1 ,2 ]
Nawaz, Zubair [1 ,2 ]
Bukhari, Faisal [1 ,2 ]
机构
[1] Punjab Univ Coll Informat Technol, Lahore, Pakistan
[2] Univ Punjab, Punjab Univ Coll Informat Technol, New Campus, Lahore, Pakistan
关键词
Low-resource languages; Asian languages; Punjabi; Shahmukhi; named entity recognition;
D O I
10.1145/3383306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named entity recognition (NER) refers to the identification of proper nouns from natural language text and classifying them into named entity types, such as person, location, and organization. Due to the widespread applications of NER, numerous NER techniques and benchmark datasets have been developed for bothWestern and Asian languages. Even though Shahmukhi script of the Punjabi language has been used by nearly three fourths of the Punjabi speakers worldwide, Gurmukhi has been the main focus of research activities. Specifically, a benchmark NER corpus for Shahmukhi is non-existent, which has thwarted the commencement of NER research for the Shahmukhi script. To this end, this article presents the development and specifications of the first-ever NER corpus for Shahmukhi. The newly developed corpus is composed of 318,275 tokens and 16,300 named entities, including 11,147 persons, 3,140 locations, and 2,013 organizations. To establish the strength of our corpus, we have compared the specifications of our corpus with its Gurmukhi counterparts. Furthermore, we have demonstrated the usability of our corpus using five supervised learning techniques, including two state-of-the-art deep learning techniques. The results are compared, and valuable insights about the behaviors of the most effective technique are discussed.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Recent Named Entity Recognition and Classification techniques: A systematic review
    Goyal, Archana
    Gupta, Vishal
    Kumar, Manish
    COMPUTER SCIENCE REVIEW, 2018, 29 : 21 - 43
  • [22] Few-shot classification in Named Entity Recognition Task
    Fritzler, Alexander
    Logacheva, Varvara
    Kretov, Maksim
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 993 - 1000
  • [23] Using Data Augmentation and Bidirectional Encoder Representations from Transformers for Improving Punjabi Named Entity Recognition
    Khalid, Hamza
    Murtaza, Ghulam
    Abbas, Qaiser
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [24] Named Entity Recognition for Vietnamese
    Dat Ba Nguyen
    Son Huu Hoang
    Son Bao Pham
    Thai Phuong Nguyen
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, PROCEEDINGS, 2010, 5991 : 205 - 214
  • [25] Persian Named Entity Recognition
    Dashtipour, Kia
    Gogate, Mandar
    Adeel, Ahsan
    Algarafi, Abdulrahman
    Howard, Newton
    Hussain, Amir
    2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 79 - 83
  • [26] Incorporating rich background knowledge for gene named entity classification and recognition
    Yanpeng Li
    Hongfei Lin
    Zhihao Yang
    BMC Bioinformatics, 10
  • [27] Named Entity Recognition for Tweets
    Liu, Xiaohua
    Wei, Furu
    Zhang, Shaodian
    Zhou, Ming
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2013, 4 (01)
  • [28] NAMED ENTITY RECOGNITION FOR POLISH
    Marcinczuk, Michal
    Wawer, Aleksander
    POZNAN STUDIES IN CONTEMPORARY LINGUISTICS, 2019, 55 (02): : 239 - 269
  • [29] NAMED ENTITY RECOGNITION FOR ROMANIAN
    Iftene, Adrian
    Trandabat, Diana
    Toader, Mihai
    Corici, Marius
    KEPT 2011: KNOWLEDGE ENGINEERING PRINCIPLES AND TECHNIQUES, 2011, : 49 - 60
  • [30] An Overview of Named Entity Recognition
    Sun, Peng
    Yang, Xuezhen
    Zhao, Xiaobing
    Wang, Zhijuan
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 273 - 278