Automatic Text Summarization Method Based on Improved TextRank Algorithm and K-Means Clustering

被引：7

作者：

Liu, Wenjun ^{[1
,2
]}

Sun, Yuyan ^{[2
]}

Yu, Bao ^{[2
]}

Wang, Hailan ^{[2
]}

Peng, Qingcheng ^{[2
]}

Hou, Mengshu ^{[1
,3
]}

Guo, Huan ^{[2
]}

Wang, Hai ^{[2
]}

Liu, Cheng ^{[1
,4
]}

机构：

[1] Univ Elect Sci & Technol China UESTC, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China

[2] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China

[3] Chengdu Technol Univ, Sch Big Data & Artificial Intelligence, Chengdu 611730, Peoples R China

[4] 30th Res Inst China Elect Technol Grp Corp, Sci & Technol Commun Secur Lab, Chengdu 610041, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 287卷

基金：

中国国家自然科学基金;

关键词：

Text Summarization; Sentence Vector; K -means Clustering; Word Embedding; TextRank Algorithm;

D O I：

10.1016/j.knosys.2024.111447

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic text summarization is to obtain a summary by compressing the text while retaining its important information. Then users can obtain the important content of the text by reading the summary. In the research literatures, the extraction summary method is widely used and is also one type of the main research methods of summary methods. However, this extraction summary method still has some problems. The selection of the initial cluster center has not been carefully determined, and the sentence redundancy summarized is high in articles with complex sentences. In order to solve the above problems, this paper proposes an automatic text summarization method based on improved TextRank algorithm and K -Means clustering. This method combines the improved BM25 model and the TextRank algorithm to calculate the BM25 similarity between sentences and obtain the TR scores of sentences. The TR scores are used to select the initial center of clustering based on similarity difference judgment and maximum judgment. The final summary is obtained by combining the cluster scores and sentence scores. The experimental results show that the proposed method in this paper has better evaluation indicators containing ROUGE -1, ROUGE -2 and ROUGE -L than other comparison algorithms including Lead -3, TextRank and MBM25EMB on the DUC2004 dataset. In conclusion, the proposed method in this paper improves the accuracy of automatic text summarization and reduce the redundancy from documents.

引用

页数：15

共 50 条

[41] Clustering of College Students Based on Improved K-means Algorithm
Fan, Zhongxiang
Yan, Sun
2016 INTERNATIONAL COMPUTER SYMPOSIUM (ICS), 2016, : 676 - 679
[42] An Improved K-Means Clustering Algorithm Based on Semantic Model
Liu, Zhe
Bao, Jianmin
Ding, Fei
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING 2018 (ICITEE '18), 2018,
[43] Clustering of college students based on improved K-means algorithm
Fan Z.
Sun Y.
Luo H.
Journal of Computers (Taiwan), 2017, 28 (06) : 195 - 203
[44] An improved K-means clustering algorithm based on normal matrix
School of Compute Science and Technology, Ludong University, Yantai 264025, China
Int. Symp. Test Autom. Instrum., ISTAI, (2182-2185):
[45] An Improved K-means Algorithm for Document Clustering
Wu, Guohua
Lin, Hairong
Fu, Ershuai
Wang, Liuyang
2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND MECHANICAL AUTOMATION (CSMA), 2015, : 65 - 69
[46] Distributed Algorithm for Text Documents Clustering Based on k-Means Approach
Sarnovsky, Martin
Carnoka, Noema
INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2015, PT II, 2016, 430 : 165 - 174
[47] Similarity matrix-based K-means algorithm for text clustering
曹奇敏
郭巧
吴向华
Journal of Beijing Institute of Technology, 2015, 24 (04) : 566 - 572
[48] A K-means Text Clustering Algorithm Based on Subject Feature Vector
Duo, Ji
Zhang, Peng
Hao, Liu
JOURNAL OF WEB ENGINEERING, 2021, 20 (06): : 1935 - 1946
[49] A k-means based clustering algorithm
Bloisi, Domenico Daniele
Locchi, Luca
COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 109 - 118
[50] An Improved Heuristic K-Means Clustering Method Using Genetic Algorithm Based Initialization
Mustafi, D.
Sahoo, G.
Mustafi, A.
ADVANCES IN COMPUTATIONAL INTELLIGENCE, 2017, 509 : 123 - 132

← 1 2 3 4 5 →