Bridging the Kuwaiti Dialect Gap in Natural Language Processing

被引:0
|
作者
Husain, Fatemah [1 ]
Alostad, Hana [2 ]
Omar, Halima [3 ]
机构
[1] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Informat Sci Dept, Safat 13060, Kuwait
[2] Gulf Univ Sci & Technol, Coll Arts & Sci, Comp Sci Dept, Hawally 32093, Kuwait
[3] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Commun Disorders Sci Dept, Safat 13060, Kuwait
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Natural language processing; Sentiment analysis; Labeling; Linguistics; Annotations; Cleaning; Text categorization; Zero-shot learning; Machine learning; weak supervision; zero-shot language model; sentiment analysis; Arabic language; machine learning; Kuwaiti dialect;
D O I
10.1109/ACCESS.2024.3364367
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The available dialectal Arabic linguistic resources are very limited in their coverage of Arabic dialects, particularly the Kuwaiti dialect. This shortage of linguistic resources creates struggles for researchers in the Natural Language Processing (NLP) field and limits the development of advanced linguistic analytical and processing tools for the Kuwaiti dialect. Many other low-resource Arabic dialects are still not explored in research due to the challenges faced during the annotators' recruitment process for dataset labeling. This paper proposes a weak supervised classification system to solve the problem of recruiting human annotators called "q8SentiLabeler". In addition, we developed a large dataset consisting of over 16.6k posts serving sentiment analysis in the Kuwaiti dialect. This dataset covers several themes and timeframes to remove any bias that might affect its content. Furthermore, we evaluated our dataset using multiple traditional machine-learning classifiers and advanced deep-learning language models to test its performance. Results demonstrate the positive potential of "q8SentiLabeler" to replace human annotators with a 93% for pairwise percent agreement and 0.87 for Cohen's Kappa coefficient. Using the ARBERT model on our dataset, we achieved 89% accuracy in the system's performance.
引用
收藏
页码:27709 / 27722
页数:14
相关论文
共 50 条
  • [31] Natural language processing
    EDITORIAL: Automatische Sprachverarbeitung
    Hoepel-Man, Jakob, 1600, De Gruyter Oldenbourg (36):
  • [32] Natural language processing
    Anon
    1600, Knowledge Technology Inc. (15):
  • [33] Natural language processing
    Gelbukh, A
    HIS 2005: 5th International Conference on Hybrid Intelligent Systems, Proceedings, 2005, : 6 - 6
  • [34] BRIDGING GAP - NATURAL PARENTS AND ADOPTIVE FAMILIES
    WITKIN, LJ
    SOCIAL WORK, 1971, 16 (04) : 95 - 97
  • [35] BRIDGING THE GAP BETWEEN NATURAL AND ARTIFICIAL PHOTOSYNTHESIS
    NORRIS, JR
    GAST, P
    JOURNAL OF PHOTOCHEMISTRY, 1985, 29 (1-2): : 185 - 194
  • [36] Putting Natural in Natural Language Processing
    Chrupala, Grzegorz
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 7820 - 7827
  • [37] SOCRATES 2.0: Bridging the gap between researchers and social media data through natural language interactions
    Choi, Dongho
    Matni, Ziad
    Shah, Chirag
    Proceedings of the Association for Information Science and Technology, 2015, 52 (01) : 1 - 4
  • [38] Intersubjectivity, language, and culture: Bridging the person environment gap?
    Saari, C
    SMITH COLLEGE STUDIES IN SOCIAL WORK, 1999, 69 (02) : 221 - 237
  • [39] Bridging the Domain Gap for Stance Detection for the Zulu Language
    Dlamini, Gcinizwe
    Bekkouch, Imad Eddine Ibrahim
    Khan, Adil
    Derczynski, Leon
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2023, 542 : 312 - 325
  • [40] ANALYZING THE NEED OF DIGITAL TOOL FOR BRIDGING THE LANGUAGE GAP
    Hussnain, Muhammad
    Baloch, Hareen
    Mumtaz, Nehala
    Miyan, Masooma Zehra
    12TH INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE (INTED), 2018, : 6455 - 6458