VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation

被引:0
|
作者
Luo, Fuli [1 ]
Wang, Wei [1 ]
Liu, Jiahao [1 ]
Liu, Yijia [1 ]
Bi, Bin [1 ]
Huang, Songfang [1 ]
Huang, Fei [1 ]
Si, Luo [1 ]
机构
[1] Alibaba Grp, Hangzhou, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing work in multilingual pretraining has demonstrated the potential of cross-lingual transferability by training a unified Transformer encoder for multiple languages. However, much of this work only relies on the shared vocabulary and bilingual contexts to encourage the correlation across languages, which is loose and implicit for aligning the contextual representations between languages. In this paper, we plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. More importantly, when fine-tuning on downstream tasks, the cross-attention module can be plugged in or out on-demand, thus naturally benefiting a wider range of cross-lingual tasks, from language understanding to generation. As a result, the proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark, covering text classification, sequence labeling, question answering, and sentence retrieval. For cross-lingual generation tasks, it also outperforms all existing cross-lingual models and state-of-the-art Transformer variants on WMT14 English-to-German and English-to-French translation datasets, with gains of up to 1 similar to 2 BLEU.
引用
收藏
页码:3980 / 3994
页数:15
相关论文
共 50 条
  • [31] MULTI-STYLE ADAPTIVE TRAINING FOR ROBUST CROSS-LINGUAL SPOKEN LANGUAGE UNDERSTANDING
    He, Xiaodong
    Deng, Li
    Hakkani-Tur, Dilek
    Tur, Gokhan
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8342 - 8346
  • [32] Multimodal Pre-training Method for Vision-language Understanding and Generation
    Liu T.-Y.
    Wu Z.-X.
    Chen J.-J.
    Jiang Y.-G.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2024 - 2034
  • [33] An analysis on language transfer of pre-trained language model with cross-lingual post-training
    Son, Suhyune
    Park, Chanjun
    Lee, Jungseob
    Shim, Midan
    Lee, Chanhee
    Jang, Yoonna
    Seo, Jaehyung
    Lim, Jungwoo
    Lim, Heuiseok
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 267
  • [34] Cross-lingual Spoken Language Understanding with Regularized Representation Alignment
    Liu, Zihan
    Winata, Genta Indra
    Xu, Peng
    Lin, Zhaojiang
    Fung, Pascale
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7241 - 7251
  • [35] FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding
    Fang, Yuwei
    Wang, Shuohang
    Gan, Zhe
    Sun, Siqi
    Liu, Jingjing
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12776 - 12784
  • [36] EMMA- X: An EM-like Multilingual Pre-training Algorithm for Cross-lingual Representation Learning
    Guo, Ping
    Wei, Xiangpeng
    Hu, Yue
    Yang, Baosong
    Liu, Dayiheng
    Huang, Fei
    Xie, Jun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [37] Unified pre-training for program understanding and generation
    Ahmad, Wasi Uddin
    Chakraborty, Saikat
    Ray, Baishakhi
    Chang, Kai-Wei
    arXiv, 2021,
  • [38] Unified Pre-training for Program Understanding and Generation
    Ahmad, Wasi Uddin
    Chakraborty, Saikat
    Ray, Baishakhi
    Chang, Kai-Wei
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2655 - 2668
  • [39] ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation
    Maurya, Kaushal Kumar
    Desarkar, Maunendra Sankar
    Kano, Yoshinobu
    Deepshikha, Kumari
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2804 - 2818
  • [40] MPNet: Masked and Permuted Pre-training for Language Understanding
    Song, Kaitao
    Tan, Xu
    Qin, Tao
    Lu, Jianfeng
    Liu, Tie-Yan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33