Leveraging Large Language Models in Low-resourced Language NLP: A spaCy Implementation for Modern Tibetan

被引：0

作者：

Kyogoku, Yuki ^{[1
]}

Erhard, Franz Xaver ^{[1
]}

Engels, James ^{[2
]}

Barnett, Robert ^{[3
]}

机构：

[1] Univ Leipzig, Leipzig, Germany

[2] Univ Edinburgh, Edinburgh, Scotland

[3] SOAS Univ London, London, England

来源：

REVUE D ETUDES TIBETAINES | 2025年 / 74期

关键词：

D O I：

暂无

中图分类号：

C [社会科学总论];

学科分类号：

03 ; 0303 ;

摘要：

Large Language Models (LLMs) are transforming the possibilities for developing Natural Language Processing (NLP) tools for low-resource languages. While languages like Modern Tibetan have historically faced significant challenges in computational linguistics due to limited digital resources and annotated datasets, LLMs offer a promising solution. This paper describes how we leveraged Google's Gemini Pro 1.5 to generate training data for developing a basic spaCy language model for Modern Tibetan, focusing particularly on Part-of-Speech (POS) tagging. Combining traditional rule-based approaches with LLM-assisted data annotation, we demonstrate a novel methodology for creating NLP tools for languages with limited computational resources. Our findings contribute to the broader effort to enhance digital accessibility for low-resource languages while offering practical insights for similar projects in computational linguistics.

引用

页数：34

共 50 条

[31] Leveraging Large Language Models for Sequential Recommendation
Harte, Jesse
Zorgdrager, Wouter
Louridas, Panos
Katsifodimos, Asterios
Jannach, Dietmar
Fragkoulis, Marios
PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1096 - 1102
[32] Leveraging Large Language Models for Tradespace Exploration
Apaza, Gabriel
Selva, Daniel
JOURNAL OF SPACECRAFT AND ROCKETS, 2024, 61 (05) : 1165 - 1183
[33] Leveraging large language models for predictive chemistry
Jablonka, Kevin Maik
Schwaller, Philippe
Ortega-Guerrero, Andres
Smit, Berend
NATURE MACHINE INTELLIGENCE, 2024, 6 (02) : 122 - 123
[34] LEVERAGING LARGE LANGUAGE MODELS WITH VOCABULARY SHARING FOR SIGN LANGUAGE TRANSLATION
Lee, Huije
Kim, Jung-Ho
Hwang, Eui Jun
Kim, Jaewoo
Park, Jong C.
2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
[35] Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning
Chen, Wenda
Hasegawa-Johnson, Mark
Chen, Nancy F.
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2047 - 2051
[36] Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective
Lavrinovics, Ernests
Biswas, Russa
Bjerva, Johannes
Hose, Katja
JOURNAL OF WEB SEMANTICS, 2025, 85
[37] The Best of both Worlds: Dual Channel Language modeling for Hope Speech Detection in low-resourced Kannada
Hande, Adeep
Hegde, Siddhanth U.
Sangeetha, Sivanesan
Priyadharshini, Ruba
Chakravarthi, Bharathi Raja
PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 127 - 135
[38] Multilingual broad phoneme recognition and language-independent spoken term detection for low-resourced languages
Deekshitha, G.
Mary, Leena
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (09) : 7313 - 7323
[39] Leveraging Large Language Models for Automated Dialogue Analysis
Finch, Sarah E.
Paek, Ellie S.
Choi, Jinho D.
24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 202 - 215
[40] Leveraging Cognitive Science for Testing Large Language Models
Srinivasan, Ramya
Inakoshi, Hiroya
Uchino, Kanji
2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST, 2023, : 169 - 171

← 1 2 3 4 5 →