MPMFC: A Traditional Chinese Medicine Patent Classification Model Integrating Network Neighborhood Structural Features and Patent Semantic Features

被引:0
|
作者
Deng N. [1 ]
He X. [1 ]
Chen W. [1 ]
Chen X. [2 ]
机构
[1] School of Computer Science, Hubei University of Technology, Wuhan
[2] School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan
基金
中国国家自然科学基金;
关键词
Feature Fusion; Model Node2Vec; Patent Similarity Network; Pre-Training; TCM Patent Classification;
D O I
10.11925/infotech.2096-3467.2022.0429
中图分类号
学科分类号
摘要
[Objective] To solve the problem of low accuracy in classification models for Traditional Chinese Medicine (TCM) patents due to the complexity of TCM and insufficient extracted information on the characteristics of TCM patents. [Methods] We proposed a classification model for TCM patents called MPMFC (Medicine Patent Multi-feature Fusion Classifier). Firstly, we constructed a TCM patent similarity network based on the similarity information of the patent core fields. Then, we used the Node2Vec algorithm to capture the neighborhood structure information of potential patents from the global structure of the TCM patent similarity network, which was mapped to low-dimensional vectors as additional features. Finally, the attention mechanism was utilized to fuse the patent semantic feature vector pre-trained by RoBERTa-Tiny with their corresponding supplementary features to classify TCM patents automatically. [Results] We examined the MPMFC model on a corpus of 7, 000 TCM patents. It achieved the accuracy, recall, and F1 values of 0.8436, 0.8017, and 0.822 1, respectively, which were 1.58%, 2.59%, and 2.11% higher than the baseline classification model. [Limitations] The weight allocation when constructing the similarity network of TCM patents has subjectivity issues. There may be some classification errors when Non-TCM researchers label patents. [Conclusions] The MPMFC model can acquire and learn more comprehensive feature representations from multiple perspectives during TCM patent classification, improving classification accuracy. © 2023 Data Analysis and Knowledge Discovery. All rights reserved.
引用
收藏
页码:145 / 158
页数:13
相关论文
共 33 条
  • [1] Zhao Shuaimei, Song Jiangxiu, Du Maobo, Et al., Current Situation and Consideration of Patent Protection in Classical Representative Famous Prescriptions in China, China Journal of Chinese Materia Medica, 44, 18, pp. 4067-4071, (2019)
  • [2] Deng N, Fu H, Chen X., Named Entity Recognition of Traditional Chinese Medicine Patents Based on BiLSTM-CRF, Wireless Communications and Mobile Computing, 2021, pp. 1-12, (2021)
  • [3] Wang Kai, Xie Xiaoli, Hu Xuan, Et al., Patent Information Mining and Drug Law Analysis of Traditional Chinese Medicine for the Prevention and Treatment of Diabetes, Chinese Journal of Library and Information Science for Traditional Chinese Medicine, 46, 6, pp. 8-16, (2022)
  • [4] Liu Xiaoling, Tan Zongying, Clustering Technology Topics Based on Patent Multi-Attribute Fusion, Data Analysis and Knowledge Discovery, 6, 2, pp. 45-54, (2022)
  • [5] Zhou Cheng, Wei Hongqin, Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine, Data Analysis and Knowledge Discovery, 3, 5, pp. 117-124, (2019)
  • [6] Yang Z C, Yang D Y, Dyer C, Et al., Hierarchical Attention Networks for Document Classification, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480-1489, (2016)
  • [7] Bao Xiang, Liu Guifeng, Cui Jinghua, Application of Multi Instance Multi Label Learning in Chinese Patent Automatic Classification, Library and Information Service, 65, 8, pp. 107-113, (2021)
  • [8] Fu Chuanchuan, Chen Guohua, Yuan Qinjian, Research on Patent Quality Analysis and Classification Forecast Based on Machine Learning—Taking Blockchain as an Example, Journal of Modern Information, 41, 7, pp. 110-120, (2021)
  • [9] Zheng Yongfeng, Patent Collection of Traditional Chinese Medicine, (1994)
  • [10] Mikolov T, Sutskever I, Chen K, Et al., Distributed Representations of Words and Phrases and Their Compositionality [C], Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 3111-3119, (2013)