Are All Languages Created Equal in Multilingual BERT?

被引:0
|
作者
Wu, Shijie [1 ]
Dredze, Mark [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
来源
5TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2020) | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilingual BERT (mBERT) (Devlin, 2018) trained on 104 languages has shown surprisingly good cross-lingual performance on several NLP tasks, even without explicit cross-lingual signals (Wu and Dredze, 2019; Pires et al., 2019). However, these evaluations have focused on cross-lingual transfer with high-resource languages, covering only a third of the languages covered by mBERT. We explore how mBERT performs on a much wider set of languages, focusing on the quality of representation for low-resource languages, measured by within-language performance. We consider three tasks: Named Entity Recognition (99 languages), Part-of-speech Tagging, and Dependency Parsing (54 languages each). mBERT does better than or comparable to baselines on high resource languages but does much worse for low resource languages. Furthermore, monolingual BERT models for these languages do even worse. Paired with similar languages, the performance gap between monolingual BERT and mBERT can be narrowed. We find that better models for low resource languages require more efficient pretraining techniques or more data.
引用
收藏
页码:120 / 130
页数:11
相关论文
共 50 条
  • [1] Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting
    Huang, Haoyang
    Tang, Tianyi
    Zhang, Dongdong
    Zhao, Wayne Xin
    Song, Ting
    Xia, Yan
    Wei, Furu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12365 - 12394
  • [2] Extending Multilingual BERT to Low-Resource Languages
    Wang, Zihan
    Karthikeyan, K.
    Mayhew, Stephen
    Roth, Dan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2649 - 2656
  • [3] All troponins are not created equal
    Jossi, S
    Gordon, SL
    Legge, MA
    Armstrong, GP
    INTERNAL MEDICINE JOURNAL, 2006, 36 (05) : 325 - 327
  • [5] NOT ALL PIXELS ARE CREATED EQUAL
    Grotta, Sally Wiener
    Grotta, Daniel
    IEEE SPECTRUM, 2012, 49 (05) : 22 - 24
  • [6] All nanocarriers are created equal
    Park, Kinam
    JOURNAL OF CONTROLLED RELEASE, 2008, 130 (02) : 139 - 139
  • [7] ALL HYPOXEMIA IS NOT CREATED EQUAL
    Kubota-Mishra, E. A.
    Agwu, A. V.
    Evangelista, M. S.
    JOURNAL OF INVESTIGATIVE MEDICINE, 2018, 66 (02) : 494 - 494
  • [8] All progestins are not created equal
    Stanczyk, FZ
    STEROIDS, 2003, 68 (10-13) : 879 - 890
  • [9] All epoxies are not created equal
    Wiggins, Janell
    Aberdeen's Concrete Repair Digest, 1996, 7 (03): : 144 - 146
  • [10] ALL Vessels Are Not Created Equal
    Bielenberg, Diane R.
    D'Amore, Patricia A.
    AMERICAN JOURNAL OF PATHOLOGY, 2013, 182 (04): : 1087 - 1091