Comparison of various approaches to tagging for the inflectional Slovak language

被引:0
|
作者
Benko L. [1 ]
Munkova D. [1 ]
Pappová M. [1 ]
Munk M. [1 ,2 ]
机构
[1] Department of Computer Science, Constantine the Philosopher University in Nitra, Nitra
[2] Science and Research Centre, University of Pardubice, Pardubice
关键词
Automatic taggers; Low-resource language; Morhological annotation; Part-of-speech tagging; Slovak language;
D O I
10.7717/PEERJ-CS.2026
中图分类号
学科分类号
摘要
Morphological tagging provides essential insights into grammar, structure, and the mutual relationships of words within the sentence. Tagging text in a highly inflectional language presents a challenging task due to word ambiguity. This research aims to compare six different automatic taggers for the inflectional Slovak language, seeking for the most accurate tagger for literary and non-literary texts. Our results indicate that it is useful to differentiate texts into literary and non-literary and subsequently, based on the text style to deploy a tagger. For literary texts, UDPipe2 outperformed others in seven out of nine examined tagset positions. Conversely, for non-literary texts, the RNNTagger exhibited the highest performance in eight out of nine examined tagset positions. The RNNTagger is recommended for both types of the text, the best captures the inflection of the Slovak language, but UDPipe2 demonstrates a higher accuracy for literary texts. Despite dataset size limitations, this study emphasizes the suitability of various taggers for the inflectional languages like Slovak. © Copyright 2024 Benko et al.
引用
收藏
页码:1 / 31
页数:30
相关论文
共 50 条
  • [21] Part of Speech Tagging in Urdu: Comparison of Machine and Deep Learning Approaches
    Khan, Wahab
    Daud, Ali
    Khan, Khairullah
    Nasir, Jamal Abdul
    Basheri, Mohammed
    Aljohani, Naif
    Alotaibi, Fahd Saleh
    IEEE ACCESS, 2019, 7 : 38918 - 38936
  • [22] Second Language Learning of Complex Inflectional Systems
    Kempe, Vera
    Brooks, Patricia J.
    LANGUAGE LEARNING, 2008, 58 (04) : 703 - 746
  • [23] THE DEVELOPMENT OF THE SLOVAK LANGUAGE - SLOVAK - KRAJCOVIC,R
    KUDERA, M
    HISTORICKY CASOPIS, 1982, 30 (06): : 900 - 901
  • [24] Stem Complexity and Inflectional Encoding in Language Production
    Dirk P. Janssen
    Ardi Roelofs
    Willem J. M. Levelt
    Journal of Psycholinguistic Research, 2004, 33 : 365 - 381
  • [25] A CONCISE DICTIONARY OF THE SLOVAK LANGUAGE - SLOVAK - ZELENY,A
    SHORT, D
    SLAVONIC AND EAST EUROPEAN REVIEW, 1994, 72 (01): : 163 - 164
  • [26] Stress in Slovak language
    Habijanec, Sinisa
    GOVOR, 2007, 24 (02) : 129 - 142
  • [27] THE GENEALOGY OF THE SLOVAK LANGUAGE
    MANNOVA, E
    HISTORICKY CASOPIS, 1989, 37 (03): : 485 - 486
  • [28] POS Tagging of Assamese Language and Performance Analysis of CRF plus plus and fnTBL Approaches
    Barman, Anup Kumar
    Sarmah, Jumi
    Sarma, Shikhar Kr.
    UKSIM-AMSS 15TH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION (UKSIM 2013), 2013, : 476 - 479
  • [29] A comparison among various PMD mitigation approaches
    Luo, R
    Li, TJ
    Wang, MG
    Cui, J
    Jian, SS
    APOC 2003: ASIA-PACIFIC OPTICAL AND WIRELESS COMMUNICATIONS; OPTICAL FIBERS AND PASSIVE COMPONENTS, 2003, 5279 : 571 - 579
  • [30] Finite density simulations: Comparison of various approaches
    Nakamura, Atsushi
    MODERN PHYSICS LETTERS A, 2007, 22 (7-10) : 473 - 489