News Keyword Extraction Algorithm Based on Semantic Clustering and Word Graph Model

被引：3

作者：

Ao Xiong ^{[1
]}

Derong Liu ^{[1
]}

Hongkang Tian ^{[1
]}

Zhengyuan Liu ^{[1
]}

Peng Yu ^{[1
]}

Michel Kadoch ^{[2
]}

机构：

[1] State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications

[2] Ecole de Technologie Superieure,Universitedu Quebec

来源：

Tsinghua Science and Technology | 2021年 / 26卷 / 06期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP391.1 [文字信息处理];

学科分类号：

摘要：

The internet is an abundant source of news every day. Thus, efficient algorithms to extract keywords from the text are important to obtain information quickly. However, the precision and recall of mature keyword extraction algorithms need improvement. TextRank, which is derived from the PageRank algorithm, uses word graphs to spread the weight of words. The keyword weight propagation in Text Rank focuses only on word frequency. To improve the performance of the algorithm, we propose Semantic Clustering TextRank(SCTR), a semantic clustering news keyword extraction algorithm based on TextRank. Firstly, the word vectors generated by the Bidirectional Encoder Representation from Transformers(BERT) model are used to perform k-means clustering to represent semantic clustering. Then, the clustering results are used to construct a TextRank weight transfer probability matrix. Finally,iterative calculation of word graphs and extraction of keywords are performed. The test target of this experiment is a Chinese news library. The results of the experiment conducted on this text set show that the SCTR algorithm has greater precision, recall, and F1 value than the traditional TextRank and Term Frequency-Inverse Document Frequency(TF-IDF) algorithms.

引用

页码：886 / 893

页数：8

共 50 条

[31] A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction
Hüma Kılıç
Aydın Çetin
Arabian Journal for Science and Engineering, 2023, 48 : 10673 - 10680
[32] A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction
Kilic, Huma
Cetin, Aydin
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10673 - 10680
[33] Information Extraction Model based on Semantic Role and Conceptual Graph
Yang, Xuanxuan
Zhang, Lei
2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2009, : 386 - 389
[34] Semantic-based keyword extraction method for document
Jiang, Fang
Li, Guohe
Yun, Xue
Yue, Xiang
International Journal of Future Generation Communication and Networking, 2015, 8 (05): : 37 - 46
[35] Research on keyword extraction of Tibetan web news based on improved TEXT-RANK algorithm
Lan, Chuanqi
Yu, Hongzhi
Xu, Tao
Liu, Peixin
Li, Jiuyi
PROCEEDINGS OF 2017 IEEE 2ND INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2017, : 208 - 212
[36] Incremental Patent Semantic Annotation Based on Keyword Extraction and List Extraction
Chen, Xu
Zong, Weixian
Deng, Na
Liu, Shudong
Li, Yipeng
COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2019), 2020, 993 : 91 - 101
[37] A Way to Improve Graph-Based Keyword Extraction
Cao, Jian
Jiang, Zhiheng
Huang, May
Wang, Karl
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2015, : 166 - 170
[38] Semantic-Distance Based Clustering for XML Keyword Search
Yang, Weidong
Zhu, Hao
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 : 398 - 409
[39] Information Retrieval Based on Word Semantic Clustering
Chang, Chia-Yang
Lin, Yan-Ting
Lee, Shie-Jue
Lai, Chih-Chin
2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,
[40] Graph and Centroid-based Word Clustering
Thaiprayoon, Santipong
Unger, Herwig
Kubek, Mario
2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 163 - 168

← 1 2 3 4 5 →