Cross-lingual learning for text processing: A survey

被引:33
|
作者
Pikuliak, Matus [1 ]
Simko, Marian [1 ]
Bielikova, Maria [1 ]
机构
[1] Slovak Univ Technol Bratislava, Fac Informat & Informat Technol, Ilkovicova 2, Bratislava 84216, Slovakia
关键词
Cross-lingual learning; Multilingual learning; Transfer learning; Deep learning; Machine learning; Text processing; Natural language processing;
D O I
10.1016/j.eswa.2020.113765
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many intelligent systems in business, government or academy process natural language as an input during inference or they might even communicate with users in natural language. The natural language processing is currently often done with machine learning models. However, machine learning needs training data and such data are often scarce for low-resource languages. The lack of data and resulting poor performance of natural language processing can be solved with cross-lingual learning. Cross-lingual learning is a paradigm for transferring knowledge from one natural language to another. The transfer of knowledge can help us overcome the lack of data in the target languages and create intelligent systems and machine learning models for languages, where it was not possible previously. Despite its increasing popularity and potential, no comprehensive survey on cross-lingual learning was conducted so far. We survey 173 text processing cross-lingual learning papers and examine tasks, data sets and languages that were used. The most important contribution of our work is that we identify and analyze four types of cross-lingual transfer based on "what" is being transferred. Such insight might help other NLP researchers and practitioners to understand how to use cross-lingual learning for wide range of problems. In addition, we identify what we consider to be the most important research directions that might help the community to focus their future work in cross-lingual learning. We present a comprehensive table of all the surveyed papers with various data related to the cross-lingual learning techniques they use. The table can be used to find relevant papers and compare the approaches to cross-lingual learning. To the best of our knowledge, no survey of cross-lingual text processing techniques was done in this scope before. (C) 2020 Published by Elsevier Ltd.
引用
收藏
页数:26
相关论文
共 50 条
  • [21] Cross-lingual and multilingual ontology mapping - survey
    Ivanova, Tatyana
    COMPUTER SYSTEMS AND TECHNOLOGIES (COMPSYSTECH'18), 2018, 1641 : 50 - 57
  • [22] Cross-Lingual Transfer of Cognitive Processing Complexity
    Pouw, Charlotte
    Hollenstein, Nora
    Beinborn, Lisa
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 655 - 669
  • [23] Cross-Lingual Korean Speech-to-Text Summarization
    Yoon, HyoJeon
    Dinh Tuyen Hoang
    Ngoc Thanh Nguyen
    Hwang, Dosam
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT I, 2019, 11431 : 198 - 206
  • [24] A Survey of Cross-lingual Word Embedding Models
    Ruder, Sebastian
    Vulic, Ivan
    Sogaard, Anders
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2019, 65 : 569 - 630
  • [25] Adverse Conditions and Techniques for Cross-Lingual Text Recognition
    Kaur, Achint
    Shrawankar, Urmila
    2017 INTERNATIONAL CONFERENCE ON INNOVATIVE MECHANISMS FOR INDUSTRY APPLICATIONS (ICIMIA), 2017, : 70 - 74
  • [26] A Comparative Evaluation of Cross-Lingual Text Annotation Techniques
    Zhang, Lei
    Rettinger, Achim
    Faerber, Michael
    Tadic, Marko
    INFORMATION ACCESS EVALUATION: MULTILINGUALITY, MULTIMODALITY, AND VISUALIZATION, 2013, 8138 : 124 - 135
  • [27] Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 685 - 688
  • [28] A multilingual text mining approach to web cross-lingual text retrieval
    Chau, RW
    Yeh, CH
    KNOWLEDGE-BASED SYSTEMS, 2004, 17 (5-6) : 219 - 227
  • [29] Enhancing Cross-lingual Natural Language Inference by Prompt-learning from Cross-lingual Templates
    Qi, Kunxun
    Wan, Hai
    Du, Jianfeng
    Chen, Haolan
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1910 - 1923
  • [30] Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (02)