Leveraging Large Language Models in Low-resourced Language NLP: A spaCy Implementation for Modern Tibetan

被引:0
|
作者
Kyogoku, Yuki [1 ]
Erhard, Franz Xaver [1 ]
Engels, James [2 ]
Barnett, Robert [3 ]
机构
[1] Univ Leipzig, Leipzig, Germany
[2] Univ Edinburgh, Edinburgh, Scotland
[3] SOAS Univ London, London, England
来源
REVUE D ETUDES TIBETAINES | 2025年 / 74期
关键词
D O I
暂无
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Large Language Models (LLMs) are transforming the possibilities for developing Natural Language Processing (NLP) tools for low-resource languages. While languages like Modern Tibetan have historically faced significant challenges in computational linguistics due to limited digital resources and annotated datasets, LLMs offer a promising solution. This paper describes how we leveraged Google's Gemini Pro 1.5 to generate training data for developing a basic spaCy language model for Modern Tibetan, focusing particularly on Part-of-Speech (POS) tagging. Combining traditional rule-based approaches with LLM-assisted data annotation, we demonstrate a novel methodology for creating NLP tools for languages with limited computational resources. Our findings contribute to the broader effort to enhance digital accessibility for low-resource languages while offering practical insights for similar projects in computational linguistics.
引用
收藏
页数:34
相关论文
共 50 条
  • [41] Leveraging large language models for data analysis automation
    Jansen, Jacqueline A.
    Manukyan, Artur
    Al Khoury, Nour
    Akalin, Altuna
    PLOS ONE, 2025, 20 (02):
  • [42] MicroRec: Leveraging Large Language Models for Microservice Recommendation
    Alsayed, Ahmed Saeed
    Dam, Hoa Khanh
    Nguyen, Chau
    2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 419 - 430
  • [43] Leveraging Large Language Models for Sensor Data Retrieval
    Berenguer, Alberto
    Morejon, Adriana
    Tomas, David
    Mazon, Jose-Norberto
    APPLIED SCIENCES-BASEL, 2024, 14 (06):
  • [44] Leveraging Large Language Models for Navigating Brand Territory
    Luisa Fernanda Rodriguez-Sarmiento
    Vladimir Sanchez-Riaño
    Ixent Galpin
    SN Computer Science, 5 (8)
  • [45] Leveraging large language models for word sense disambiguation
    Jung H. Yae
    Nolan C. Skelly
    Neil C. Ranly
    Phillip M. LaCasse
    Neural Computing and Applications, 2025, 37 (6) : 4093 - 4110
  • [46] Leveraging Large Language Models for VNF Resource Forecasting
    Su, Jing
    Nair, Suku
    Popokh, Leo
    2024 IEEE 10TH INTERNATIONAL CONFERENCE ON NETWORK SOFTWARIZATION, NETSOFT 2024, 2024, : 258 - 262
  • [47] Leveraging Large Language Models for Effective Organizational Navigation
    Chandrasekar, Haresh
    Gupta, Srishti
    Liu, Chun-Tzu
    Tsai, Chun-Hua
    PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH, DGO 2024, 2024, : 1020 - 1022
  • [48] Leveraging large language models to foster equity in healthcare
    Rodriguez, Jorge A.
    Alsentzer, Emily
    Bates, David W.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09)
  • [49] Leveraging Large Language Models for Clinical Abbreviation Disambiguation
    Hosseini, Manda
    Hosseini, Mandana
    Javidan, Reza
    JOURNAL OF MEDICAL SYSTEMS, 2024, 48 (01)
  • [50] Leveraging large language models for peptide antibiotic design
    Guan, Changge
    Fernandes, Fabiano C.
    Franco, Octavio L.
    de la Fuente-nunez, Cesar
    CELL REPORTS PHYSICAL SCIENCE, 2025, 6 (01):