Cross-lingual learning for text processing: A survey

被引:33
|
作者
Pikuliak, Matus [1 ]
Simko, Marian [1 ]
Bielikova, Maria [1 ]
机构
[1] Slovak Univ Technol Bratislava, Fac Informat & Informat Technol, Ilkovicova 2, Bratislava 84216, Slovakia
关键词
Cross-lingual learning; Multilingual learning; Transfer learning; Deep learning; Machine learning; Text processing; Natural language processing;
D O I
10.1016/j.eswa.2020.113765
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many intelligent systems in business, government or academy process natural language as an input during inference or they might even communicate with users in natural language. The natural language processing is currently often done with machine learning models. However, machine learning needs training data and such data are often scarce for low-resource languages. The lack of data and resulting poor performance of natural language processing can be solved with cross-lingual learning. Cross-lingual learning is a paradigm for transferring knowledge from one natural language to another. The transfer of knowledge can help us overcome the lack of data in the target languages and create intelligent systems and machine learning models for languages, where it was not possible previously. Despite its increasing popularity and potential, no comprehensive survey on cross-lingual learning was conducted so far. We survey 173 text processing cross-lingual learning papers and examine tasks, data sets and languages that were used. The most important contribution of our work is that we identify and analyze four types of cross-lingual transfer based on "what" is being transferred. Such insight might help other NLP researchers and practitioners to understand how to use cross-lingual learning for wide range of problems. In addition, we identify what we consider to be the most important research directions that might help the community to focus their future work in cross-lingual learning. We present a comprehensive table of all the surveyed papers with various data related to the cross-lingual learning techniques they use. The table can be used to find relevant papers and compare the approaches to cross-lingual learning. To the best of our knowledge, no survey of cross-lingual text processing techniques was done in this scope before. (C) 2020 Published by Elsevier Ltd.
引用
收藏
页数:26
相关论文
共 50 条
  • [11] Cross-lingual Text Clustering in a Large System
    Schneider, Nicole R.
    Sankaranarayanan, Jagan
    Samet, Hanan
    PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023, 2023, : 1 - 11
  • [12] Cross-Lingual Speech-to-Text Summarization
    Pontes, Elvys Linhares
    Gonzalez-Gallardo, Carlos-Emiliano
    Torres-Moreno, Juan-Manuel
    Huet, Stephane
    MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 385 - 395
  • [13] On cross-lingual retrieval with multilingual text encoders
    Robert Litschko
    Ivan Vulić
    Simone Paolo Ponzetto
    Goran Glavaš
    Information Retrieval Journal, 2022, 25 : 149 - 183
  • [14] SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism
    Fatima, Mehwish
    Kolber, Tim
    Markert, Katja
    Strube, Michael
    NewSumm 2023 - Proceedings of the 4th New Frontiers in Summarization Workshop, Proceedings of EMNLP Workshop, 2023, : 24 - 40
  • [15] Cross-lingual text filtering based on text concepts and kNN
    Li, SZ
    Su, WF
    Li, TQ
    Chen, HW
    PACLIC 17: Language, Information and Computation, Proceedings, 2003, : 166 - 173
  • [16] Cross-Lingual Learning with Distributed Representations
    Pikuliak, Matus
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8032 - 8033
  • [17] Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text Classification
    Dong, Xin
    Zhu, Yaxin
    Zhang, Yupeng
    Fu, Zuohui
    Xu, Dongkuan
    Yang, Sen
    de Melo, Gerard
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1541 - 1544
  • [18] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Ghanbari, Elham
    Shakery, Azadeh
    APPLIED INTELLIGENCE, 2022, 52 (03) : 3156 - 3174
  • [19] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Elham Ghanbari
    Azadeh Shakery
    Applied Intelligence, 2022, 52 : 3156 - 3174
  • [20] A survey of cross-lingual word embedding models
    Ruder, Sebastian
    Vulić, Ivan
    Søgaard, Anders
    Journal of Artificial Intelligence Research, 2019, 65 : 569 - 631