Toward Low-Resource Languages Machine Translation: A Language-Specific Fine-Tuning With LoRA for Specialized Large Language Models

被引:0
|
作者
Liang, Xiao [1 ,2 ]
Khaw, Yen-Min Jasmina [1 ]
Liew, Soung-Yue [3 ]
Tan, Tien-Ping [4 ]
Qin, Donghong [2 ]
机构
[1] Univ Tunku Abdul Rahman, Fac Informat & Commun Technol, Dept Comp Sci, Kampar 31900, Malaysia
[2] Guangxi Minzu Univ, Sch Artificial Intelligence, Nanning 530008, Peoples R China
[3] Univ Tunku Abdul Rahman, Fac Informat & Commun Technol, Dept Comp & Commun Technol, Kampar 31900, Malaysia
[4] Univ Sains Malaysia, Sch Comp Sci, George Town 11700, Malaysia
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Machine translation; low-resource languages; large language models; parameter-efficient fine-tuning; LoRA;
D O I
10.1109/ACCESS.2025.3549795
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of computational linguistics, addressing machine translation (MT) challenges for low-resource languages remains crucial, as these languages often lack extensive data compared to high-resource languages. General large language models (LLMs), such as GPT-4 and Llama, primarily trained on monolingual corpora, face significant challenges in translating low-resource languages, often resulting in subpar translation quality. This study introduces Language-Specific Fine-Tuning with Low-rank adaptation (LSFTL), a method that enhances translation for low-resource languages by optimizing the multi-head attention and feed-forward networks of Transformer layers through low-rank matrix adaptation. LSFTL preserves the majority of the model parameters while selectively fine-tuning key components, thereby maintaining stability and enhancing translation quality. Experiments on non-English centered low-resource Asian languages demonstrated that LSFTL improved COMET scores by 1-3 points compared to specialized multilingual machine translation models. Additionally, LSFTL's parameter-efficient approach allows smaller models to achieve performance comparable to their larger counterparts, highlighting its significance in making machine translation systems more accessible and effective for low-resource languages.
引用
收藏
页码:46616 / 46626
页数:11
相关论文
共 50 条
  • [1] Fine-tuning large language models for improved health communication in low-resource languages
    Bui, Nhat
    Nguyen, Giang
    Nguyen, Nguyen
    Vo, Bao
    Vo, Luan
    Huynh, Tom
    Tang, Arthur
    Tran, Van Nhiem
    Huynh, Tuyen
    Nguyen, Huy Quang
    Dinh, Minh
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 263
  • [2] adaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds
    Lankford, Seamus
    Afli, Haithem
    Way, Andy
    INFORMATION, 2023, 14 (12)
  • [3] Fine Tuning Language Models: A Tale of Two Low-Resource Languages
    Rosel OidaOnesa
    Melvin ABallera
    Data Intelligence, 2024, 6 (04) : 946 - 967
  • [4] Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis
    Dhananjaya, Vinura
    Ranathunga, Surangika
    Jayasena, Sanath
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (05) : 1116 - 1125
  • [5] Efficient Fine-Tuning for Low-Resource Tibetan Pre-trained Language Models
    Zhou, Mingjun
    Daiqing, Zhuoma
    Qun, Nuo
    Nyima, Tashi
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII, 2024, 15022 : 410 - 422
  • [6] AgglutiFiT: Efficient Low-Resource Agglutinative Language Model Fine-Tuning
    Li, Zhe
    Li, Xiuhong
    Sheng, Jiabao
    Slamu, Wushour
    IEEE ACCESS, 2020, 8 : 148489 - 148499
  • [7] Fine-Tuning ASR models for Very Low-Resource Languages: A Study on Mvskoke
    Mainzinger, Julia
    Levow, Gina-Anne
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 94 - 100
  • [8] Improving Machine Translation Capabilities by Fine-Tuning Large Language Models and Prompt Engineering with Domain-Specific Data
    Laki, Laszlo Janos
    Yang, Zijian Gyozo
    2024 IEEE 3RD CONFERENCE ON INFORMATION TECHNOLOGY AND DATA SCIENCE, CITDS 2024, 2024, : 129 - 133
  • [9] Machine Translation into Low-resource Language Varieties
    Kumar, Sachin
    Anastasopoulos, Antonios
    Wintner, Shuly
    Tsvetkov, Yulia
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 110 - 121
  • [10] Large Language Models and Low-Resource Languages: An Examination of Armenian NLP
    Avetisyan, Hayastan
    Broneske, David
    13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 199 - 210