Chinese-Vietnamese Cross-Language Event Retrieval Incorporating Event Knowledge

被引:0
|
作者
Huang Y. [1 ,2 ]
Deng T. [1 ,2 ]
Yu Z. [1 ,2 ]
Xian Y. [1 ,2 ]
机构
[1] Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming
[2] Key Laboratory of Artificial Intelligence in Yunnan Province, Kunming University of Science and Technology, Kunming
基金
中国国家自然科学基金;
关键词
Contrastive Learning; Cross⁃Language Event Retrieval; Event Knowledge; Event Pre⁃training;
D O I
10.16451/j.cnki.issn1003-6059.202310003
中图分类号
学科分类号
摘要
The goal of Chinese⁃Vietnamese cross⁃language event retrieval task is to retrieve Vietnamese documents expressing the same event based on the input Chinese query. Existing cross⁃ language retrieval models exhibit poor alignment in Chinese⁃Vietnamese low⁃resource retrieval, and simple semantic matching retrieval struggles to comprehend the event semantic information of complex queries. To address this issue, a Chinese⁃Vietnamese cross⁃language event retrieval method incorporating event knowledge is proposed. A Chinese⁃Vietnamese cross⁃language event pre⁃training module is built for continuous pre⁃training to improve the representation performance of the model on Chinese⁃Vietnamese low⁃resource languages. The difference between the masked predicted values and the true values of the event is discriminated based on contrastive learning to encourage the model to better understand and capture the event knowledge features. Experiments on cross⁃language event retrieval tasks and cross⁃ language question⁃and⁃answer tasks demonstrate the performance improvement of the proposed method. © 2023 Journal of Pattern Recognition and Artificial Intelligence. All rights reserved.
引用
收藏
页码:890 / 901
页数:11
相关论文
共 24 条
  • [1] CAO Y, LIU H, WAN X J., Jointly Learning to Align and Summa⁃ rize for Neural Cross⁃Lingual Summarization, Proc of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6220-6231, (2020)
  • [2] CONNEAU A, WU S J, LI H R, Et al., Emerging Cross⁃Lingual Structure in Pretrained Language Models, Proc of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6022-6034, (2020)
  • [3] BHATTACHARYA P, GOYAL P, SARKAR S., Using Communities of Words Derived from Multilingual Word Vectors for Cross⁃Language Information Retrieval in Indian Languages, ACM Transactions on Asian and Low⁃Resource Language Information Processing, 18, 1, (2018)
  • [4] JIANG Z L, EL-JAROUDI A, HARTMANN W, Et al., Cross⁃Lin⁃ gual Information Retrieval with BERT, Proc of the Workshop on Cross⁃Language Search and Summarization of Text and Speech, pp. 26-31, (2020)
  • [5] DEVLIN J, CHANG M W, LEE K, Et al., BERT: Pre⁃training of Deep Bidirectional Transformers for Language Understanding, Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Long and Short Papers), pp. 4171-4186, (2019)
  • [6] CONNEAU A, KHANDELWAL K, GOYAL N, Et al., Unsupervised Cross⁃Lingual Representation Learning at Scale, Proc of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440-8451, (2020)
  • [7] LIU Y H, OTT M, GOYAL N, Et al., RoBERTa: A Robustly Opti⁃ mized BERT Pretraining Approach [C / OL ]
  • [8] YANG Z L, DAI Z H, YANG Y M, Et al., XLNet: Generalized Au⁃ toregressive Pretraining for Language Understanding, Proc of the 33rd International Conference on Neural Information Processing Sys⁃ tems, pp. 5753-5763, (2019)
  • [9] CHANG W C, FELIX X Y, CHANG Y W, Et al., Pre⁃training Tasks for Embedding⁃Based Large⁃Scale Retrieval [C / OL]
  • [10] NOGUEIRA R, CHO K., Passage Re⁃ranking with BERT