INFOXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

被引:0
|
作者
Chi, Zewen [1 ,2 ]
Dong, Li [2 ]
Wei, Furu [2 ]
Yang, Nan [2 ]
Singhal, Saksham [2 ]
Wang, Wenhui [2 ]
Song, Xia [2 ]
Mao, Xian-Ling [1 ]
Huang, Heyan [1 ]
Zhou, Ming [2 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Microsoft Corp, Redmond, WA 98052 USA
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pretraining task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at https: //aka.ms/infoxim.
引用
收藏
页码:3576 / 3588
页数:13
相关论文
共 50 条
  • [1] Alternating Language Modeling for Cross-Lingual Pre-Training
    Yang, Jian
    Ma, Shuming
    Zhang, Dongdong
    Wu, Shuangzhi
    Li, Zhoujun
    Zhou, Ming
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9386 - 9393
  • [2] Cross-Lingual Natural Language Generation via Pre-Training
    Chi, Zewen
    Dong, Li
    Wei, Furu
    Wang, Wenhui
    Mao, Xian-Ling
    Huang, Heyan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7570 - 7577
  • [3] Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training
    Zheng, Bo
    Dong, Li
    Huang, Shaohan
    Singhal, Saksham
    Che, Wanxiang
    Liu, Ting
    Song, Xia
    Wei, Furu
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3203 - 3215
  • [4] XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
    Chi, Zewen
    Huang, Shaohan
    Dong, Li
    Ma, Shuming
    Zheng, Bo
    Singhal, Saksham
    Bajaj, Payal
    Song, Xia
    Mao, Xian-Ling
    Huang, Heyan
    Wei, Furu
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6170 - 6182
  • [5] Mixed-Lingual Pre-training for Cross-lingual Summarization
    Xu, Ruochen
    Zhu, Chenguang
    Shi, Yu
    Zeng, Michael
    Huang, Xuedong
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 536 - 541
  • [6] XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge
    Jiang, Xiaoze
    Liang, Yaobo
    Chen, Weizhu
    Duan, Nan
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10840 - 10848
  • [7] Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks
    Huang, Haoyang
    Liang, Yaobo
    Duan, Nan
    Gong, Ming
    Shou, Linjun
    Jiang, Daxin
    Zhou, Ming
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2485 - 2494
  • [8] VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation
    Luo, Fuli
    Wang, Wei
    Liu, Jiahao
    Liu, Yijia
    Bi, Bin
    Huang, Songfang
    Huang, Fei
    Si, Luo
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3980 - 3994
  • [9] Neural Machine Translation Based on XLM-R Cross-lingual Pre-training Language Model
    Wang Q.
    Li M.
    Wu S.
    Wang M.
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2022, 58 (01): : 29 - 36
  • [10] On-the-fly Cross-lingual Masking for Multilingual Pre-training
    Ai, Xi
    Fang, Bin
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 855 - 876