Fine-tuning large language models for improved health communication in low-resource languages

被引:0
|
作者
Bui, Nhat [1 ]
Nguyen, Giang [1 ]
Nguyen, Nguyen [1 ]
Vo, Bao [1 ]
Vo, Luan [1 ]
Huynh, Tom [1 ]
Tang, Arthur [1 ]
Tran, Van Nhiem [2 ]
Huynh, Tuyen [3 ]
Nguyen, Huy Quang [3 ]
Dinh, Minh [1 ]
机构
[1] RMIT Univ, Sch Sci Engn & Technol, Ho Chi Minh City, Vietnam
[2] Hon Hai Res Inst, AI Res Ctr, Taipei 114699, Taiwan
[3] Oxford Univ Clin Res Unit OUCRU, Ho Chi Minh City, Vietnam
关键词
Artificial intelligence; Large language model; Low-resources languages; Health communication and promotion; Data privacy and security; Health equity;
D O I
10.1016/j.cmpb.2025.108655
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: The reported study illustrates a methodology for compiling training datasets to fine-tune Large Language Models (LLMs) for healthcare information in Vietnamese, a low-resource language. The objective is to bridge the gap in medical information accessibility and enhance healthcare communication in developing countries by adapting LLMs to specific linguistic nuances and domain needs. Method: The methodology involves selecting a base model, compiling a domain-specific dataset, and fine-tuning the model with this dataset. Three open-source models were selected. The dataset, comprising approximately 337,000 prompt-response pairs in Vietnamese, was compiled using existing datasets, data crawled from Vietnamese medical online forums, and distilled from Vietnamese medical textbooks. The three models were finetuned using the Low-Rank adaptation (LoRA) and Quantized Low-Rank adaptation (QLoRA) techniques. Models' performances were evaluated using BertScore score, Rouge-L score, and the "LLM-as-a-Judge" method. Results: The fine-tuned models showed enhancements in performance over their base versions across evaluation metrics in BertScore score, Rouge-L score and "LLM-as-a-Judge" method, confirming the effectiveness of the finetuning process. This study details the process of fine-tuning open-source LLMs for health information inquiries in Vietnamese, demonstrating its potential to improve healthcare communication in low-resource languages. Deploying the fine-tuned LLM on-premise enhances data privacy and security. However, the significant computing power and costs required pose challenges, especially for organizations in developing countries. Conclusion: This case study highlights the unique challenges faced by developing countries using low-resource languages. Initiatives are needed to emphasize efforts to bridge healthcare gaps in underserved areas and contribute to global health equity.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages
    Chen, Fuxiang
    Fard, Fatemeh H.
    Lo, David
    Bryksin, Timofey
    30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 401 - 412
  • [42] Enhanced Discriminative Fine-Tuning of Large Language Models for Chinese Text Classification
    Song, Jinwang
    Zan, Hongying
    Zhang, Kunli
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 168 - 174
  • [43] Personalized Large Language Models through Parameter Efficient Fine-Tuning Techniques
    Braga, Marco
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 3076 - 3076
  • [44] CSAFT: Continuous Semantic Augmentation Fine-Tuning for Legal Large Language Models
    Li, Bo
    Fan, Shuang
    Huang, Jin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT V, 2024, 15020 : 293 - 307
  • [45] Selective privacy-preserving framework for large language models fine-tuning
    Wang, Teng
    Zhai, Lindong
    Yang, Tengfei
    Luo, Zhucheng
    Liu, Shuanggen
    INFORMATION SCIENCES, 2024, 678
  • [46] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
    Zong, Yongshuo
    Bohdal, Ondrej
    Yu, Tingyang
    Yang, Yongxin
    Hospedales, Timothy
    Proceedings of Machine Learning Research, 2024, 235 : 62867 - 62891
  • [47] Parameter-efficient fine-tuning of large language models using semantic knowledge tuning
    Prottasha, Nusrat Jahan
    Mahmud, Asif
    Sobuj, Md. Shohanur Islam
    Bhat, Prakash
    Kowsher, Md
    Yousefi, Niloofar
    Garibay, Ozlem Ozmen
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [48] A two-stage fine-tuning method for low-resource cross-lingual summarization
    Zhang, Kaixiong
    Zhang, Yongbing
    Yu, Zhengtao
    Huang, Yuxin
    Tan, Kaiwen
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2024, 21 (01) : 1125 - 1143
  • [49] DN at SemEval-2023 Task 12: Low-Resource Language Text Classification via Multilingual Pretrained Language Model Fine-tuning
    Daniil, Homskiy
    Narek, Maloyan
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 1537 - 1541
  • [50] Improved Visual Fine-tuning with Natural Language Supervision
    Wang, Junyang
    Xu, Yuanhong
    Hu, Juhua
    Yan, Ming
    Sang, Jitao
    Qian, Qi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11865 - 11875