Fine-tuning large language models for improved health communication in low-resource languages

被引:0
|
作者
Bui, Nhat [1 ]
Nguyen, Giang [1 ]
Nguyen, Nguyen [1 ]
Vo, Bao [1 ]
Vo, Luan [1 ]
Huynh, Tom [1 ]
Tang, Arthur [1 ]
Tran, Van Nhiem [2 ]
Huynh, Tuyen [3 ]
Nguyen, Huy Quang [3 ]
Dinh, Minh [1 ]
机构
[1] RMIT Univ, Sch Sci Engn & Technol, Ho Chi Minh City, Vietnam
[2] Hon Hai Res Inst, AI Res Ctr, Taipei 114699, Taiwan
[3] Oxford Univ Clin Res Unit OUCRU, Ho Chi Minh City, Vietnam
关键词
Artificial intelligence; Large language model; Low-resources languages; Health communication and promotion; Data privacy and security; Health equity;
D O I
10.1016/j.cmpb.2025.108655
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: The reported study illustrates a methodology for compiling training datasets to fine-tune Large Language Models (LLMs) for healthcare information in Vietnamese, a low-resource language. The objective is to bridge the gap in medical information accessibility and enhance healthcare communication in developing countries by adapting LLMs to specific linguistic nuances and domain needs. Method: The methodology involves selecting a base model, compiling a domain-specific dataset, and fine-tuning the model with this dataset. Three open-source models were selected. The dataset, comprising approximately 337,000 prompt-response pairs in Vietnamese, was compiled using existing datasets, data crawled from Vietnamese medical online forums, and distilled from Vietnamese medical textbooks. The three models were finetuned using the Low-Rank adaptation (LoRA) and Quantized Low-Rank adaptation (QLoRA) techniques. Models' performances were evaluated using BertScore score, Rouge-L score, and the "LLM-as-a-Judge" method. Results: The fine-tuned models showed enhancements in performance over their base versions across evaluation metrics in BertScore score, Rouge-L score and "LLM-as-a-Judge" method, confirming the effectiveness of the finetuning process. This study details the process of fine-tuning open-source LLMs for health information inquiries in Vietnamese, demonstrating its potential to improve healthcare communication in low-resource languages. Deploying the fine-tuned LLM on-premise enhances data privacy and security. However, the significant computing power and costs required pose challenges, especially for organizations in developing countries. Conclusion: This case study highlights the unique challenges faced by developing countries using low-resource languages. Initiatives are needed to emphasize efforts to bridge healthcare gaps in underserved areas and contribute to global health equity.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] adaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds
    Lankford, Seamus
    Afli, Haithem
    Way, Andy
    INFORMATION, 2023, 14 (12)
  • [2] Toward Low-Resource Languages Machine Translation: A Language-Specific Fine-Tuning With LoRA for Specialized Large Language Models
    Liang, Xiao
    Khaw, Yen-Min Jasmina
    Liew, Soung-Yue
    Tan, Tien-Ping
    Qin, Donghong
    IEEE ACCESS, 2025, 13 : 46616 - 46626
  • [3] Fine Tuning Language Models: A Tale of Two Low-Resource Languages
    Rosel OidaOnesa
    Melvin ABallera
    Data Intelligence, 2024, 6 (04) : 946 - 967
  • [4] Fine-Tuning ASR models for Very Low-Resource Languages: A Study on Mvskoke
    Mainzinger, Julia
    Levow, Gina-Anne
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 94 - 100
  • [5] Efficient Fine-Tuning for Low-Resource Tibetan Pre-trained Language Models
    Zhou, Mingjun
    Daiqing, Zhuoma
    Qun, Nuo
    Nyima, Tashi
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII, 2024, 15022 : 410 - 422
  • [6] Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis
    Dhananjaya, Vinura
    Ranathunga, Surangika
    Jayasena, Sanath
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (05) : 1116 - 1125
  • [7] AgglutiFiT: Efficient Low-Resource Agglutinative Language Model Fine-Tuning
    Li, Zhe
    Li, Xiuhong
    Sheng, Jiabao
    Slamu, Wushour
    IEEE ACCESS, 2020, 8 : 148489 - 148499
  • [8] Large Language Models and Low-Resource Languages: An Examination of Armenian NLP
    Avetisyan, Hayastan
    Broneske, David
    13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 199 - 210
  • [9] Exploration of Whisper fine-tuning strategies for low-resource ASR
    Liu, Yunpeng
    Yang, Xukui
    Qu, Dan
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [10] Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages
    Do, Phat
    Coler, Matt
    Dijkstra, Jelske
    Klabbers, Esther
    INTERSPEECH 2023, 2023, : 5466 - 5470