On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

被引：0

作者：

Wang, Zirui ^{[1
]}

Lipton, Zachary C. ^{[1
]}

Tsvetkov, Yulia ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modern multilingual models are trained on concatenated text from multiple languages in hopes of conferring benefits to each (positive transfer), with the most pronounced benefits accruing to low-resource languages. However, recent work has shown that this approach can degrade performance on high-resource languages, a phenomenon known as negative interference. In this paper, we present the first systematic study of negative interference. We show that, contrary to previous belief, negative interference also impacts low-resource languages. While parameters are maximally shared to learn language-universal structures, we demonstrate that language-specific parameters do exist in multilingual models and they are a potential cause of negative interference. Motivated by these observations, we also present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference, by adding language-specific layers as meta-parameters and training them in a manner that explicitly improves shared layers' generalization on all languages. Overall, our results show that negative interference is more common than previously known, suggesting new directions for improving multilingual representations.(1)

引用

页码：4438 / 4450

页数：13

共 50 条

[41] Meta-Learning the Difference: Preparing Large Language Models for Efficient Adaptation
Hou, Zejiang
Salazar, Julian
Polovets, George
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1249 - 1265
[42] Hierarchical Meta-learning Models with Deep Neural Networks for Spectrum Assignment
Rutagemwa, Humphrey
Baddour, Kareem E.
Rong, Bo
2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2019,
[43] Landau Theory of Meta-learning
Plewczynski, Dariusz
SECURITY AND INTELLIGENT INFORMATION SYSTEMS, 2012, 7053 : 142 - 153
[44] Meta-learning of Textual Representations
Madrid, Jorge G.
Jair Escalante, Hugo
Morales, Eduardo
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT I, 2020, 1167 : 57 - 67
[45] Meta-Learning to Compositionally Generalize
Conklin, Henry
Wang, Bailin
Smith, Kenny
Titov, Ivan
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3322 - 3335
[46] Continual meta-learning algorithm
Mengjuan Jiang
Fanzhang Li
Li Liu
Applied Intelligence, 2022, 52 : 4527 - 4542
[47] Meta-learning: Bayesian or quantum?
Mastrogiorgio, Antonio
BEHAVIORAL AND BRAIN SCIENCES, 2024, 47
[48] Progressive Meta-Learning With Curriculum
Zhang, Ji
Song, Jingkuan
Gao, Lianli
Liu, Ye
Shen, Heng Tao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 5916 - 5930
[49] A survey of deep meta-learning
Mike Huisman
Jan N. van Rijn
Aske Plaat
Artificial Intelligence Review, 2021, 54 : 4483 - 4541
[50] Meta-Learning with Adaptive Hyperparameters
Baik, Sungyong
Choi, Myungsub
Choi, Janghoon
Kim, Heewon
Lee, Kyoung Mu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33

← 1 2 3 4 5 →