Fine-tuning large language models for improved health communication in low-resource languages

被引：0

作者：

Bui, Nhat ^{[1
]}

Nguyen, Giang ^{[1
]}

Nguyen, Nguyen ^{[1
]}

Vo, Bao ^{[1
]}

Vo, Luan ^{[1
]}

Huynh, Tom ^{[1
]}

Tang, Arthur ^{[1
]}

Tran, Van Nhiem ^{[2
]}

Huynh, Tuyen ^{[3
]}

Nguyen, Huy Quang ^{[3
]}

Dinh, Minh ^{[1
]}

机构：

[1] RMIT Univ, Sch Sci Engn & Technol, Ho Chi Minh City, Vietnam

[2] Hon Hai Res Inst, AI Res Ctr, Taipei 114699, Taiwan

[3] Oxford Univ Clin Res Unit OUCRU, Ho Chi Minh City, Vietnam

来源：

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE | 2025年 / 263卷

关键词：

Artificial intelligence; Large language model; Low-resources languages; Health communication and promotion; Data privacy and security; Health equity;

D O I：

10.1016/j.cmpb.2025.108655

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Background: The reported study illustrates a methodology for compiling training datasets to fine-tune Large Language Models (LLMs) for healthcare information in Vietnamese, a low-resource language. The objective is to bridge the gap in medical information accessibility and enhance healthcare communication in developing countries by adapting LLMs to specific linguistic nuances and domain needs. Method: The methodology involves selecting a base model, compiling a domain-specific dataset, and fine-tuning the model with this dataset. Three open-source models were selected. The dataset, comprising approximately 337,000 prompt-response pairs in Vietnamese, was compiled using existing datasets, data crawled from Vietnamese medical online forums, and distilled from Vietnamese medical textbooks. The three models were finetuned using the Low-Rank adaptation (LoRA) and Quantized Low-Rank adaptation (QLoRA) techniques. Models' performances were evaluated using BertScore score, Rouge-L score, and the "LLM-as-a-Judge" method. Results: The fine-tuned models showed enhancements in performance over their base versions across evaluation metrics in BertScore score, Rouge-L score and "LLM-as-a-Judge" method, confirming the effectiveness of the finetuning process. This study details the process of fine-tuning open-source LLMs for health information inquiries in Vietnamese, demonstrating its potential to improve healthcare communication in low-resource languages. Deploying the fine-tuned LLM on-premise enhances data privacy and security. However, the significant computing power and costs required pose challenges, especially for organizations in developing countries. Conclusion: This case study highlights the unique challenges faced by developing countries using low-resource languages. Initiatives are needed to emphasize efforts to bridge healthcare gaps in underserved areas and contribute to global health equity.

引用

页数：11

共 50 条

[31] Repeatability of Fine-Tuning Large Language Models Illustrated Using QLoRA
Alahmari, Saeed S.
Hall, Lawrence O.
Mouton, Peter R.
Goldgof, Dmitry B.
IEEE ACCESS, 2024, 12 : 153221 - 153231
[32] Fine-tuning large language models for rare disease concept normalization
Wang, Andy
Liu, Cong
Yang, Jingye
Weng, Chunhua
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2076 - 2083
[33] Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed Representations
Liu, Linlin
Li, Xingxuan
Thakkar, Megh
Li, Xin
Joty, Shafiq
Si, Luo
Bing, Lidong
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4799 - 4816
[34] Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation
Dabre, Raj
Fujita, Atsushi
Chu, Chenhui
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1410 - 1416
[35] GlotLID: Language Identification for Low-Resource Languages
Kargaran, Amir Hossein
Imani, Ayyoob
Yvon, Francois
Schuetze, Hinrich
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6155 - 6218
[36] Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages
Adeyemi, Mofetoluwa
Oladipo, Akintunde
Pradeep, Ronak
Lin, Jimmy
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 650 - 656
[37] Exploring Large Language Models for Low-Resource IT Information Extraction
Bhavya, Bhavya
Isaza, Paulina Toro
Deng, Yu
Nidd, Michael
Azad, Amar Prakash
Shwartz, Larisa
Zhai, ChengXiang
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1203 - 1212
[38] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Zong, Yongshuo
Bohdal, Ondrej
Yu, Tingyang
Yang, Yongxin
Hospedales, Timothy
arXiv, 1600,
[39] Parameter-efficient fine-tuning in large language models: a survey of methodologies
Luping Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
Artificial Intelligence Review, 58 (8)
[40] Prompting or Fine-tuning? A Comparative Study of Large Language Models for Taxonomy Construction
Chen, Boqi
Yi, Fandi
Varro, Daniel
2023 ACM/IEEE INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS COMPANION, MODELS-C, 2023, : 588 - 596

← 1 2 3 4 5 →