Getting More Data for Low-resource Morphological Inflection: Language Models and Data Augmentation

被引：0

作者：

Sorokin, Alexey ^{[1
]}

机构：

[1] Moscow MV Lomonosov State Univ, Moscow Inst Phys & Technol, Fac Math & Mech, Leninskie Gory,GSP 1, Moscow, Russia

来源：

PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020) | 2020年

关键词：

inflection; encoder-decoder; abstract paradigms; language models; data augmentation;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We investigate the effect of data augmentation on low-resource morphological segmentation. We compare two settings: the pure low-resource one, when only 100 annotated word forms are available, and the augmented one, where we use the original training set and 1000 unlabeled word forms to generate 1000 artificial inflected forms. Evaluating on Sigmorphon 2018 dataset, we observe that using the best among these two models reduces the error rate of state-of-the-art model by 6%, while for our baseline model the error reduction is 17%

引用

页码：3978 / 3983

页数：6

共 50 条

[41] Enhancing African low-resource languages: Swahili data for language modelling
Shikali, Casper S.
Mokhosi, Refuoe
DATA IN BRIEF, 2020, 31
[42] Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu
Ullah, Fida
Gelbukh, Alexander
Zamir, Muhammad Tayyab
Riveron, Edgardo Manuel Felipe
Sidorov, Grigori
COMPUTERS, 2024, 13 (10)
[43] Contrastive Learning for Morphological Disambiguation Using Large Language Models in Low-Resource Settings
Tolegen, Gulmira
Toleu, Alymzhan
Mussabayev, Rustam
APPLIED SCIENCES-BASEL, 2024, 14 (21):
[44] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
Zolzaya Byambadorj
Ryota Nishimura
Altangerel Ayush
Kengo Ohta
Norihide Kitaoka
EURASIP Journal on Audio, Speech, and Music Processing, 2021
[45] Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
Comini, Giulia
Huybrechts, Goeric
Ribeiro, Manuel Sam
Gabrys, Adam
Lorenzo-Trueba, Jaime
INTERSPEECH 2022, 2022, : 1946 - 1950
[46] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
Byambadorj, Zolzaya
Nishimura, Ryota
Ayush, Altangerel
Ohta, Kengo
Kitaoka, Norihide
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[47] Efficient Data Augmentation via lexical matching for boosting performance on Statistical Machine Translation for Indic and a Low-resource language
Saxena, Shefali
Gupta, Ayush
Daniel, Philemon
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (24) : 64255 - 64269
[48] Cognate Projection for Low-Resource Inflection Generation
Hauer, Bradley
Habibi, Amir A.
Luan, Yixing
Riyadh, Rashed Rubby
Kondrak, Grzegorz
16TH SIGMORPHON WORKSHOP ON COMPUTATIONAL RESEARCH IN PHONETICS PHONOLOGY, AND MORPHOLOGY (SIGMORPHON 2019), 2019, : 6 - 11
[49] Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation
Liu, Zoey
Prud'hommeaux, Emily
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 393 - 413
[50] On the scalability of data augmentation techniques for low-resource machine translation between Chinese and Vietnamese
Vu, Huan
Bui, Ngoc Dung
JOURNAL OF INFORMATION AND TELECOMMUNICATION, 2023, 7 (02) : 241 - 253

← 1 2 3 4 5 →