Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization

被引:1
|
作者
Effland, Thomas [1 ]
Collins, Michael [2 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Google Res, New York, NY USA
关键词
Compendex;
D O I
10.1162/tacl_a_00537
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Expected Statistic Regulariza tion (ESR), a novel regularization technique that utilizes low-order multi-task structural statistics to shape model distributions for semi- supervised learning on low-resource datasets. We study ESR in the context of cross-lingual transfer for syntactic analysis (POS tagging and labeled dependency parsing) and present several classes of low-order statistic functions that bear on model behavior. Experimentally, we evaluate the proposed statistics with ESR for unsupervised transfer on 5 diverse target languages and show that all statistics, when estimated accurately, yield improvements to both POS and LAS, with the best statistic improving POS by +7.0 and LAS by +8.5 on average. We also present semi-supervised transfer and learning curve experiments that show ESR provides significant gains over strong cross-lingual-transfer-plus-fine-tuning baselines for modest amounts of label data. These results indicate that ESR is a promising and complementary approach to model-transfer approaches for cross-lingual parsing.(1)
引用
收藏
页码:122 / 138
页数:17
相关论文
共 50 条
  • [21] UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for Low-Resource Languages
    Trinh Pham
    Le, Khoi M.
    Luu Anh Tuan
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3168 - 3184
  • [22] Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition
    Lu, Liang
    Ghoshal, Arnab
    Renals, Steve
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 17 - 27
  • [23] Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages
    Gupta, Shivanshu
    Matsubara, Yoshitomo
    Chadha, Ankit
    Moschitti, Alessandro
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 14078 - 14092
  • [24] Adversarial Cross-Lingual Transfer Learning for Slot Tagging of Low-Resource Languages
    He, Keqing
    Yan, Yuanmeng
    Xu, Weiran
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [25] Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD
    Taghizadeh, Nasrin
    Faili, Hesham
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 56 : 61 - 87
  • [26] Translation Errors Significantly Impact Low-Resource Languages in Cross-Lingual Learning
    Agrawal, Ashish Sunil
    Fazili, Barah
    Jyothi, Preethi
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 319 - 329
  • [27] Cross-lingual subspace Gaussian mixture models for low-resource speech recognition
    1600, Institute of Electrical and Electronics Engineers Inc., United States (22):
  • [28] Cross-lingual offensive speech identification with transfer learning for low-resource languages
    Shi, Xiayang
    Liu, Xinyi
    Xu, Chun
    Huang, Yuanyuan
    Chen, Fang
    Zhu, Shaolin
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [29] Cross-lingual subspace Gaussian mixture models for low-resource speech recognition
    1600, Institute of Electrical and Electronics Engineers Inc., United States (22):
  • [30] CAM: A cross-lingual adaptation framework for low-resource language speech recognition
    Hu, Qing
    Zhang, Yan
    Zhang, Xianlei
    Han, Zongyu
    Yu, Xilong
    INFORMATION FUSION, 2024, 111