Multimodal Seed Data Augmentation for Low-Resource Audio Latin Cuengh Language

被引：0

作者：

Jiang, Lanlan ^{[1
]}

Qin, Xingguo ^{[2
]}

Zhang, Jingwei ^{[2
]}

Li, Jun ^{[2
]}

机构：

[1] Guilin Univ Elect Technol, Sch Business, Guilin 541004, Peoples R China

[2] Guilin Univ Elect Technol, Sch Comp Sci & Informat Secur, Guilin 541004, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 20期

基金：

中国国家自然科学基金;

关键词：

seed data augmentation; low-resource data; Latin Cuengh language; multimodal;

D O I：

10.3390/app14209533

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Latin Cuengh is a low-resource dialect that is prevalent in select ethnic minority regions in China. This language presents unique challenges for intelligent research and preservation efforts, primarily due to its oral tradition and the limited availability of textual resources. Prior research has sought to bolster intelligent processing capabilities with regard to Latin Cuengh through data augmentation techniques leveraging scarce textual data, with modest success. In this study, we introduce an innovative multimodal seed data augmentation model designed to significantly enhance the intelligent recognition and comprehension of this dialect. After supplementing the pre-trained model with extensive speech data, we fine-tune its performance with a modest corpus of multilingual textual seed data, employing both Latin Cuengh and Chinese texts as bilingual seed data to enrich its multilingual properties. We then refine its parameters through a variety of downstream tasks. The proposed model achieves a commendable performance across both multi-classification and binary classification tasks, with its average accuracy and F1 measure increasing by more than 3%. Moreover, the model's training efficiency is substantially ameliorated through strategic seed data augmentation. Our research provides insights into the informatization of low-resource languages and contributes to their dissemination and preservation.

引用

页数：13

共 50 条

[11] Data Augmentation for Low-Resource Neural Machine Translation
Fadaee, Marzieh
Bisazza, Arianna
Monz, Christof
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 567 - 573
[12] Data Augmentation Methods for Low-Resource Orthographic Syllabification
Suyanto, Suyanto
Lhaksmana, Kemas M.
Bijaksana, Moch Arif
Kurniawan, Adriana
IEEE ACCESS, 2020, 8 : 147399 - 147406
[13] MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION
Meng, Linghui
Xu, Jin
Tan, Xu
Wang, Jindong
Qin, Tao
Xu, Bo
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7008 - 7012
[14] Data augmentation for low-resource grapheme-to-phoneme mapping
Hammond, Michael
SIGMORPHON 2021: 18TH SIGMORPHON WORKSHOP ON COMPUTATIONAL RESEARCH IN PHONETICS, PHONOLOGY, AND MORPHOLOGY, 2021, : 126 - 130
[15] Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution
Nguyen, Toan Q.
Murray, Kenton
Chiang, David
IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION, 2021, : 287 - 293
[16] DALE: Generative Data Augmentation for Low-Resource Legal NLP
Ghosh, Sreyan
Evuru, Chandra Kiran
Kumar, Sonal
Ramaneswaran, S.
Sakshi, S.
Tyagi, Utkarsh
Manocha, Dinesh
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 8511 - 8565
[17] Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs
Tayir, Turghun
Li, Lin
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
[18] Data augmentation for low-resource languages NMT guided by constrained sampling
Maimaiti, Mieradilijiang
Liu, Yang
Luan, Huanbo
Sun, Maosong
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (01) : 30 - 51
[19] A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
Li, Yu
Li, Xiao
Yang, Yating
Dong, Rui
INFORMATION, 2020, 11 (05)
[20] Optimizing the impact of data augmentation for low-resource grammatical error correction
Solyman, Aiman
Zappatore, Marco
Zhenyu, Wang
Mahmoud, Zeinab
Alfatemi, Ali
Ibrahim, Ashraf Osman
Gabralla, Lubna Abdelkareim
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (06)

← 1 2 3 4 5 →