On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

被引:0
|
作者
Wang, Zirui [1 ]
Lipton, Zachary C. [1 ]
Tsvetkov, Yulia [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern multilingual models are trained on concatenated text from multiple languages in hopes of conferring benefits to each (positive transfer), with the most pronounced benefits accruing to low-resource languages. However, recent work has shown that this approach can degrade performance on high-resource languages, a phenomenon known as negative interference. In this paper, we present the first systematic study of negative interference. We show that, contrary to previous belief, negative interference also impacts low-resource languages. While parameters are maximally shared to learn language-universal structures, we demonstrate that language-specific parameters do exist in multilingual models and they are a potential cause of negative interference. Motivated by these observations, we also present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference, by adding language-specific layers as meta-parameters and training them in a manner that explicitly improves shared layers' generalization on all languages. Overall, our results show that negative interference is more common than previously known, suggesting new directions for improving multilingual representations.(1)
引用
收藏
页码:4438 / 4450
页数:13
相关论文
共 50 条
  • [41] Meta-Learning the Difference: Preparing Large Language Models for Efficient Adaptation
    Hou, Zejiang
    Salazar, Julian
    Polovets, George
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1249 - 1265
  • [42] Hierarchical Meta-learning Models with Deep Neural Networks for Spectrum Assignment
    Rutagemwa, Humphrey
    Baddour, Kareem E.
    Rong, Bo
    2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2019,
  • [43] Landau Theory of Meta-learning
    Plewczynski, Dariusz
    SECURITY AND INTELLIGENT INFORMATION SYSTEMS, 2012, 7053 : 142 - 153
  • [44] Meta-learning of Textual Representations
    Madrid, Jorge G.
    Jair Escalante, Hugo
    Morales, Eduardo
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT I, 2020, 1167 : 57 - 67
  • [45] Meta-Learning to Compositionally Generalize
    Conklin, Henry
    Wang, Bailin
    Smith, Kenny
    Titov, Ivan
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3322 - 3335
  • [46] Continual meta-learning algorithm
    Mengjuan Jiang
    Fanzhang Li
    Li Liu
    Applied Intelligence, 2022, 52 : 4527 - 4542
  • [47] Meta-learning: Bayesian or quantum?
    Mastrogiorgio, Antonio
    BEHAVIORAL AND BRAIN SCIENCES, 2024, 47
  • [48] Progressive Meta-Learning With Curriculum
    Zhang, Ji
    Song, Jingkuan
    Gao, Lianli
    Liu, Ye
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 5916 - 5930
  • [49] A survey of deep meta-learning
    Mike Huisman
    Jan N. van Rijn
    Aske Plaat
    Artificial Intelligence Review, 2021, 54 : 4483 - 4541
  • [50] Meta-Learning with Adaptive Hyperparameters
    Baik, Sungyong
    Choi, Myungsub
    Choi, Janghoon
    Kim, Heewon
    Lee, Kyoung Mu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33