INFOXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

被引:0
|
作者
Chi, Zewen [1 ,2 ]
Dong, Li [2 ]
Wei, Furu [2 ]
Yang, Nan [2 ]
Singhal, Saksham [2 ]
Wang, Wenhui [2 ]
Song, Xia [2 ]
Mao, Xian-Ling [1 ]
Huang, Heyan [1 ]
Zhou, Ming [2 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Microsoft Corp, Redmond, WA 98052 USA
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pretraining task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at https: //aka.ms/infoxim.
引用
收藏
页码:3576 / 3588
页数:13
相关论文
共 50 条
  • [21] Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
    Song, Yuqing
    Chen, Shizhe
    Jin, Qin
    Luo, Wei
    Xie, Jun
    Huang, Fei
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2843 - 2852
  • [22] An analysis on language transfer of pre-trained language model with cross-lingual post-training
    Son, Suhyune
    Park, Chanjun
    Lee, Jungseob
    Shim, Midan
    Lee, Chanhee
    Jang, Yoonna
    Seo, Jaehyung
    Lim, Jungwoo
    Lim, Heuiseok
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 267
  • [23] Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-training
    Li, Zejun
    Fan, Zhihao
    Chen, JingJing
    Zhang, Qi
    Huang, Xuanjing
    Wei, Zhongyu
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 5939 - 5958
  • [24] PTEKC: pre-training with event knowledge of ConceptNet for cross-lingual event causality identification
    Zhu, Enchang
    Yu, Zhengtao
    Huang, Yuxin
    Gao, Shengxiang
    Xian, Yantuan
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (03) : 1859 - 1872
  • [25] Cross-lingual Language Model Pretraining
    Conneau, Alexis
    Lample, Guillaume
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [26] Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-training
    Hardalov, Momchil
    Arora, Arnav
    Nakov, Preslav
    Augenstein, Isabelle
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10729 - 10737
  • [27] Contrastive pre-training and instruction tuning for cross-lingual aspect-based sentiment analysis
    Zhao, Wenwen
    Yang, Zhisheng
    Yu, Song
    Zhu, Shiyu
    Li, Li
    APPLIED INTELLIGENCE, 2025, 55 (05)
  • [28] Cross-Lingual Pre-Training Based Transfer for Zero-Shot Neural Machine Translation
    Ji, Baijun
    Zhang, Zhirui
    Duan, Xiangyu
    Zhang, Min
    Chen, Boxing
    Luo, Weihua
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 115 - 122
  • [29] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Ghanbari, Elham
    Shakery, Azadeh
    APPLIED INTELLIGENCE, 2022, 52 (03) : 3156 - 3174
  • [30] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Elham Ghanbari
    Azadeh Shakery
    Applied Intelligence, 2022, 52 : 3156 - 3174