Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization

被引:1
|
作者
Effland, Thomas [1 ]
Collins, Michael [2 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Google Res, New York, NY USA
关键词
Compendex;
D O I
10.1162/tacl_a_00537
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Expected Statistic Regulariza tion (ESR), a novel regularization technique that utilizes low-order multi-task structural statistics to shape model distributions for semi- supervised learning on low-resource datasets. We study ESR in the context of cross-lingual transfer for syntactic analysis (POS tagging and labeled dependency parsing) and present several classes of low-order statistic functions that bear on model behavior. Experimentally, we evaluate the proposed statistics with ESR for unsupervised transfer on 5 diverse target languages and show that all statistics, when estimated accurately, yield improvements to both POS and LAS, with the best statistic improving POS by +7.0 and LAS by +8.5 on average. We also present semi-supervised transfer and learning curve experiments that show ESR provides significant gains over strong cross-lingual-transfer-plus-fine-tuning baselines for modest amounts of label data. These results indicate that ESR is a promising and complementary approach to model-transfer approaches for cross-lingual parsing.(1)
引用
收藏
页码:122 / 138
页数:17
相关论文
共 50 条
  • [31] Is Translation Helpful? An Exploration of Cross-Lingual Transfer in Low-Resource Dialog Generation
    Shen, Lei
    Yu, Shuai
    Shen, Xiaoyu
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
  • [32] SUBSPACE MIXTURE MODEL FOR LOW-RESOURCE SPEECH RECOGNITION IN CROSS-LINGUAL SETTINGS
    Miao, Yajie
    Metze, Florian
    Waibel, Alex
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7339 - 7343
  • [33] Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser
    Duong, Long
    Cohn, Trevor
    Bird, Steven
    Cook, Paul
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 845 - 850
  • [34] Improving Sentiment Classification in Low-Resource Bengali Language Utilizing Cross-Lingual Self-supervised Learning
    Sazzed, Salim
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021), 2021, 12801 : 218 - 230
  • [35] ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
    Casanova, Edresson
    Shulby, Christopher
    Korolev, Alexander
    Candido Junior, Arnaldo
    Soares, Anderson da Silva
    Aluisio, Sandra
    Ponti, Moacir Antonelli
    INTERSPEECH 2023, 2023, : 1244 - 1248
  • [36] Intent detection and slot filling for Persian: Cross-lingual training for low-resource languages
    Zadkamali, Reza
    Momtazi, Saeedeh
    Zeinali, Hossein
    NATURAL LANGUAGE PROCESSING, 2025, 31 (02): : 559 - 574
  • [37] LEARNING CROSS-LINGUAL INFORMATION WITH MULTILINGUAL BLSTM FOR SPEECH SYNTHESIS OF LOW-RESOURCE LANGUAGES
    Yu, Quanjie
    Liu, Peng
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    Cai, Lianhong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5545 - 5549
  • [38] Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
    Farooq, Muhammad Umar
    Hain, Thomas
    INTERSPEECH 2023, 2023, : 5072 - 5076
  • [39] Augmenting Low-Resource Cross-Lingual Summarization with Progression-Grounded Training and Prompting
    Ma, Jiu Shun
    Huang, Yuxin
    Wang, Linqin
    Huang, Xiang
    Peng, Hao
    Yu, Zhengtao
    Yu, Philip
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (09)
  • [40] Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios
    Eskander, Ramy
    Muresan, Smaranda
    Collins, Michael
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4820 - 4831