INFOXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

被引:0
|
作者
Chi, Zewen [1 ,2 ]
Dong, Li [2 ]
Wei, Furu [2 ]
Yang, Nan [2 ]
Singhal, Saksham [2 ]
Wang, Wenhui [2 ]
Song, Xia [2 ]
Mao, Xian-Ling [1 ]
Huang, Heyan [1 ]
Zhou, Ming [2 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Microsoft Corp, Redmond, WA 98052 USA
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pretraining task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at https: //aka.ms/infoxim.
引用
收藏
页码:3576 / 3588
页数:13
相关论文
共 50 条
  • [31] Investigating cross-lingual training for offensive language detection
    Pelicon, Andraz
    Shekhar, Ravi
    Skrlj, Blaz
    Purver, Matthew
    Pollak, Senja
    PEERJ COMPUTER SCIENCE, 2021, 7 : 2 - 39
  • [32] Language Anisotropic Cross-Lingual Model Editing
    Xu, Yang
    Hou, Yutai
    Che, Wanxiang
    Zhang, Min
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5554 - 5569
  • [33] Cross-lingual Language Model Pretraining for Retrieval
    Yu, Puxuan
    Fei, Hongliang
    Li, Ping
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 1029 - 1039
  • [34] EMMA- X: An EM-like Multilingual Pre-training Algorithm for Cross-lingual Representation Learning
    Guo, Ping
    Wei, Xiangpeng
    Hu, Yue
    Yang, Baosong
    Liu, Dayiheng
    Huang, Fei
    Xie, Jun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [35] ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation
    Maurya, Kaushal Kumar
    Desarkar, Maunendra Sankar
    Kano, Yoshinobu
    Deepshikha, Kumari
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2804 - 2818
  • [36] Steering Large Language Models for Cross-lingual Information Retrieval
    Guo, Ping
    Ren, Yubing
    Hu, Yue
    Cao, Yanan
    Li, Yunpeng
    Huang, Heyan
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 585 - 596
  • [37] Language Model Priming for Cross-Lingual Event Extraction
    Fincke, Steven
    Agarwal, Shantanu
    Miller, Scott
    Boschee, Elizabeth
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10627 - 10635
  • [38] An information-theoretic, vector-space-model approach to cross-language information retrieval
    Chew, Peter A.
    Bader, Brett W.
    Helmreich, Stephen
    Abdelali, Ahmed
    Verzi, Stephen J.
    NATURAL LANGUAGE ENGINEERING, 2011, 17 : 37 - 70
  • [39] Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model
    Li, Juntao
    He, Ruidan
    Ye, Hai
    Ng, Hwee Tou
    Bing, Lidong
    Yan, Rui
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3672 - 3678
  • [40] An Unsupervised Cross-Lingual Topic Model Framework for Sentiment Classification
    Lin, Zheng
    Jin, Xiaolong
    Xu, Xueke
    Wang, Yuanzhuo
    Cheng, Xueqi
    Wang, Weiping
    Meng, Dan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (03) : 432 - 444