Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance

被引:0
|
作者
Jiao Y. [1 ]
Jing M. [1 ]
Kang F. [2 ]
机构
[1] College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing
[2] Department of Computer Science and Technology, Nanjing University, Nanjing
关键词
Co-occurrence Distance; Dependency Grammar; Feature Extraction; Semantic Similarity; Text Complex Network;
D O I
10.11925/infotech.2096-3467.2019.0737
中图分类号
学科分类号
摘要
[Objective] This paper aims to break through the limitations of existing methods for text similarity calculation by synthesizing multiple text information features such as semantics, syntax and word frequency. [Methods] First, we constructed the text complex network, combining co-occurrence distance and dependency syntax. Then, we used information entropy to determine the weights of dynamics characteristics. Finally, we utilized word embedding, syntactic structure and inverted file information to avoid the loss of word structure and semantics. [Results] Compared with the syntactic network + TF-IDF algorithm, the F1 value of the proposed algorithm increased up to 12.1%. The result was 5.8% higher than that of the co-occurrence network + semantic method. The average values of F1 were 5.8% and 1.6% better than those of the existing methods. [Limitations] The selection of relevant indicators in feature extraction needs to be further improved, which address the importance of nodes more comprehensively. [Conclusions] Compared with the traditional methods, the proposed model could reduce the loss of text information and improve the accuracy of calculating text similarity effectively. © 2019 Chinese Academy of Sciences.
引用
收藏
页码:93 / 100
页数:7
相关论文
共 24 条
  • [1] Gali N, Mariescu-Istodor R, Hostettler D, Et al., Framework for Syntactic String Similarity Measures, Expert Systems with Applications, 129, pp. 169-185, (2019)
  • [2] An H, Gao X, Wei F, Et al., Research on Patterns in the Fluctuation of the Co-movement Between Crude Oil Futures and Spot Prices: A Complex Network Approach, Applied Energy, 136, pp. 1067-1075, (2014)
  • [3] Du Kun, Liu Huailiang, Guo Lujie, Study on the Modified Method of Feature Weighting with Complex Networks, New Technology of Library and Information Service, 11, pp. 26-32, (2015)
  • [4] Zhang W, Li Y, Wang S., Learning Document Representation via Topic-enhanced LSTM Model, Knowledge-Based Systems, 174, pp. 194-204, (2019)
  • [5] Salton G, Wong A, Yang C S., A Vector Space Model for Automatic Indexing, Communications of the ACM, 18, 11, pp. 613-620, (1975)
  • [6] Ezzikouri H, Madani Y, Erritali M, Et al., A New Approach for Calculating Semantic Similarity Between Words Using WordNet and Set Theory, Procedia Computer Science, 151, pp. 1261-1265, (2019)
  • [7] Garg M, Kumar M., The Structure of Word Co-occurrence Network for Microblogs, Physica A: Statistical Mechanics and Its Applications, 512, pp. 698-720, (2018)
  • [8] Tang Xiaobo, Xiao Lu, Research of Text Feature Extraction on Dependency Parsing Network, New Technology of Library and Information Service, 11, pp. 31-37, (2014)
  • [9] Zhou Dezhi, Liu Huailiang, Zhang Qian, Constructing Text Semantic Community Based on Complex Networks, Journal of Intelligence, 32, 10, pp. 136-140, (2013)
  • [10] Zhao W X, Jiang J, He J, Et al., Topical Keyphrase Extraction from Twitter[C], Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, (2011)