A Synonym Mining Algorithm Based on Pair-wise Character Embedding andNoisy Robust Learning

被引:0
|
作者
Zhang H.-Y. [1 ,2 ]
Wang J. [1 ]
机构
[1] State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha
[2] Artificial Intelligence Research Center, Defense Innovation Institute, Beijing
来源
基金
中国国家自然科学基金;
关键词
information extraction; natural language processing; noisy label learning; pair-wise character embedding; Synonym mining;
D O I
10.16383/j.aas.c210004
中图分类号
学科分类号
摘要
Synonym mining is an important task in natural language processing. In order to construct large-scale training corpus, existing studies extract synonym seeds using distant supervision and click graph filtering, which inevitably introduce noisy labels, thus affecting the training of high-quality synonym mining models. In addition, due to the few-shot and domain-distribution-shift property of most entity words, and the inconsistency between the training objective of the pre-trained word embeddings and the synonym mining task, it is difficult for the pre-trained word embeddings in the synonym mining task to produce high-quality entity semantic representations. To address these two issues, this paper proposes a synonym mining model that utilizes pair-wise character embeddings and a noise robust learning framework. The model uses pre-trained pair-wise character embeddings to enhance the entity semantic representations, estimate true label distribution and generate pseudo-labels through a joint optimization process. We want to improve the representation ability and robustness of the model through these improvements. Finally, we use WordNet to analyze and filter noisy datasets and conduct the experiments on synonym datasets of different sizes and domains. The experimental results show that the proposed synonym mining model improves the synonym set-instance classification and set generation performances compared to competitive benchmark methods under different data distribution and noise ratios. © 2023 Science Press. All rights reserved.
引用
收藏
页码:1181 / 1194
页数:13
相关论文
共 42 条
  • [1] Azad H K, Deepak A., Query expansion techniques for information retrieval: A survey, Information Processing & Management, 56, 5, pp. 1698-1735, (2019)
  • [2] Gui T, Ye J, Zhang Q, Zhou Y, Gong Y, Huang X., Leveraging document-level label consistency for named entity recognition, Proceedings of the 29th International Joint Conference on Artificial Intelligence, pp. 3976-3982, (2020)
  • [3] Zhang H, Cai J, Xu J, Wang J., Complex question decomposition for semantic parsing, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4477-4486, (2019)
  • [4] Rao Zi-Yun, Zhang Yi, Liu Jun-Tao, Cao Wan-Hua, Recommendation methods and systems using knowledge graph, Acta Automatica Sinica, 47, 9, pp. 2061-2077, (2021)
  • [5] Hou Li-Wei, Hu Po, Cao Wen-Lin, Automatic Chinese abstractive summarization with topical keywords fusion, Acta Automatica Sinica, 45, 3, pp. 530-539, (2019)
  • [6] Qu M, Ren X, Han J., Automatic synonym discovery with knowledge bases, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 997-1005, (2017)
  • [7] Wang Z, Yue X, Moosavinasab S, Huang Y, Lin S, Sun H., Surf-Con: Synonym discovery on privacy-aware clinical data, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1578-1586, (2019)
  • [8] Li C, Zhang M, Bendersky M, Deng H, Metzler D, Najork M., Multi-view embedding-based synonyms for email search, Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 575-584
  • [9] Shen J, Lyu R, Ren X, Vanni M, Sadler B, Han J., Mining entity synonyms with efficient neural set generation, Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp. 249-256, (2019)
  • [10] Song H, Kim M, Park D, Lee J., Learning from noisy labels with deep neural networks: A survey, (2020)