Cross-lingual learning for text processing: A survey

被引:33
|
作者
Pikuliak, Matus [1 ]
Simko, Marian [1 ]
Bielikova, Maria [1 ]
机构
[1] Slovak Univ Technol Bratislava, Fac Informat & Informat Technol, Ilkovicova 2, Bratislava 84216, Slovakia
关键词
Cross-lingual learning; Multilingual learning; Transfer learning; Deep learning; Machine learning; Text processing; Natural language processing;
D O I
10.1016/j.eswa.2020.113765
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many intelligent systems in business, government or academy process natural language as an input during inference or they might even communicate with users in natural language. The natural language processing is currently often done with machine learning models. However, machine learning needs training data and such data are often scarce for low-resource languages. The lack of data and resulting poor performance of natural language processing can be solved with cross-lingual learning. Cross-lingual learning is a paradigm for transferring knowledge from one natural language to another. The transfer of knowledge can help us overcome the lack of data in the target languages and create intelligent systems and machine learning models for languages, where it was not possible previously. Despite its increasing popularity and potential, no comprehensive survey on cross-lingual learning was conducted so far. We survey 173 text processing cross-lingual learning papers and examine tasks, data sets and languages that were used. The most important contribution of our work is that we identify and analyze four types of cross-lingual transfer based on "what" is being transferred. Such insight might help other NLP researchers and practitioners to understand how to use cross-lingual learning for wide range of problems. In addition, we identify what we consider to be the most important research directions that might help the community to focus their future work in cross-lingual learning. We present a comprehensive table of all the surveyed papers with various data related to the cross-lingual learning techniques they use. The table can be used to find relevant papers and compare the approaches to cross-lingual learning. To the best of our knowledge, no survey of cross-lingual text processing techniques was done in this scope before. (C) 2020 Published by Elsevier Ltd.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] TCS: A Teacher-Curriculum-Student Learning Framework for Cross-Lingual Text Labeling
    Pu T.
    Huang S.-J.
    Zhang Y.-M.
    Zhou X.-S.
    Tu Y.-F.
    Dai X.-Y.
    Chen J.-J.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (09): : 1983 - 1996
  • [32] Translation Artifacts in Cross-lingual Transfer Learning
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7674 - 7684
  • [33] Lightweight Cross-Lingual Sentence Representation Learning
    Mao, Zhuoyuan
    Gupta, Prakhar
    Chu, Chenhui
    Jaggi, Martin
    Kurohashi, Sadao
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 2902 - 2913
  • [34] Choosing Transfer Languages for Cross-Lingual Learning
    Lin, Yu-Hsiang
    Chen, Chian-Yu
    Lee, Jean
    Li, Zirui
    Zhang, Yuyan
    Xia, Mengzhou
    Rijhwani, Shruti
    He, Junxian
    Zhang, Zhisong
    Ma, Xuezhe
    Anastasopoulos, Antonios
    Littell, Patrick
    Neubig, Graham
    Anastasopoulos, Antonios
    Littell, Patrick
    Neubig, Graham
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3125 - 3135
  • [35] Active Learning for Cross-Lingual Sentiment Classification
    Li, Shoushan
    Wang, Rong
    Liu, Huanhuan
    Huang, Chu-Ren
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2013, 2013, 400 : 236 - 246
  • [36] Bleaching Text: Abstract Features for Cross-lingual Gender Prediction
    van der Goot, Rob
    Ljubesic, Nikola
    Matroos, Ian
    Nissim, Malvina
    Plank, Barbara
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 383 - 389
  • [37] Cross-Lingual Text Classification with Model Translation and Document Translation
    Moh, Teng-Sheng
    Zhang, Zhang
    PROCEEDINGS OF THE 50TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE, 2012,
  • [38] SpeakerNet for Cross-lingual Text-Independent Speaker Verification
    Habib, Hafsa
    Tauseef, Huma
    Fahiem, Muhammad Abuzar
    Farhan, Saima
    Usman, Ghousia
    ARCHIVES OF ACOUSTICS, 2020, 45 (04) : 573 - 583
  • [39] Emotion Detection in Cross-Lingual Text Based on Bidirectional LSTM
    Ren, Han
    Wan, Jing
    Ren, Yafeng
    SECURITY WITH INTELLIGENT COMPUTING AND BIG-DATA SERVICES, 2020, 895 : 838 - 845
  • [40] Cross-lingual Text Classification with Heterogeneous Graph Neural Network
    Wang, Ziyun
    Liu, Xuan
    Yang, Peiji
    Liu, Shixing
    Wang, Zhisheng
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 612 - 620