Leveraging Large Language Models in Low-resourced Language NLP: A spaCy Implementation for Modern Tibetan

被引:0
|
作者
Kyogoku, Yuki [1 ]
Erhard, Franz Xaver [1 ]
Engels, James [2 ]
Barnett, Robert [3 ]
机构
[1] Univ Leipzig, Leipzig, Germany
[2] Univ Edinburgh, Edinburgh, Scotland
[3] SOAS Univ London, London, England
来源
REVUE D ETUDES TIBETAINES | 2025年 / 74期
关键词
D O I
暂无
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Large Language Models (LLMs) are transforming the possibilities for developing Natural Language Processing (NLP) tools for low-resource languages. While languages like Modern Tibetan have historically faced significant challenges in computational linguistics due to limited digital resources and annotated datasets, LLMs offer a promising solution. This paper describes how we leveraged Google's Gemini Pro 1.5 to generate training data for developing a basic spaCy language model for Modern Tibetan, focusing particularly on Part-of-Speech (POS) tagging. Combining traditional rule-based approaches with LLM-assisted data annotation, we demonstrate a novel methodology for creating NLP tools for languages with limited computational resources. Our findings contribute to the broader effort to enhance digital accessibility for low-resource languages while offering practical insights for similar projects in computational linguistics.
引用
收藏
页数:34
相关论文
共 50 条
  • [31] Leveraging Large Language Models for Sequential Recommendation
    Harte, Jesse
    Zorgdrager, Wouter
    Louridas, Panos
    Katsifodimos, Asterios
    Jannach, Dietmar
    Fragkoulis, Marios
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1096 - 1102
  • [32] Leveraging Large Language Models for Tradespace Exploration
    Apaza, Gabriel
    Selva, Daniel
    JOURNAL OF SPACECRAFT AND ROCKETS, 2024, 61 (05) : 1165 - 1183
  • [33] Leveraging large language models for predictive chemistry
    Jablonka, Kevin Maik
    Schwaller, Philippe
    Ortega-Guerrero, Andres
    Smit, Berend
    NATURE MACHINE INTELLIGENCE, 2024, 6 (02) : 122 - 123
  • [34] LEVERAGING LARGE LANGUAGE MODELS WITH VOCABULARY SHARING FOR SIGN LANGUAGE TRANSLATION
    Lee, Huije
    Kim, Jung-Ho
    Hwang, Eui Jun
    Kim, Jaewoo
    Park, Jong C.
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [35] Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning
    Chen, Wenda
    Hasegawa-Johnson, Mark
    Chen, Nancy F.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2047 - 2051
  • [36] Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective
    Lavrinovics, Ernests
    Biswas, Russa
    Bjerva, Johannes
    Hose, Katja
    JOURNAL OF WEB SEMANTICS, 2025, 85
  • [37] The Best of both Worlds: Dual Channel Language modeling for Hope Speech Detection in low-resourced Kannada
    Hande, Adeep
    Hegde, Siddhanth U.
    Sangeetha, Sivanesan
    Priyadharshini, Ruba
    Chakravarthi, Bharathi Raja
    PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 127 - 135
  • [38] Multilingual broad phoneme recognition and language-independent spoken term detection for low-resourced languages
    Deekshitha, G.
    Mary, Leena
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (09) : 7313 - 7323
  • [39] Leveraging Large Language Models for Automated Dialogue Analysis
    Finch, Sarah E.
    Paek, Ellie S.
    Choi, Jinho D.
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 202 - 215
  • [40] Leveraging Cognitive Science for Testing Large Language Models
    Srinivasan, Ramya
    Inakoshi, Hiroya
    Uchino, Kanji
    2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST, 2023, : 169 - 171