Automatic Text Summarization Method Based on Improved TextRank Algorithm and K-Means Clustering

被引:7
|
作者
Liu, Wenjun [1 ,2 ]
Sun, Yuyan [2 ]
Yu, Bao [2 ]
Wang, Hailan [2 ]
Peng, Qingcheng [2 ]
Hou, Mengshu [1 ,3 ]
Guo, Huan [2 ]
Wang, Hai [2 ]
Liu, Cheng [1 ,4 ]
机构
[1] Univ Elect Sci & Technol China UESTC, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[2] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China
[3] Chengdu Technol Univ, Sch Big Data & Artificial Intelligence, Chengdu 611730, Peoples R China
[4] 30th Res Inst China Elect Technol Grp Corp, Sci & Technol Commun Secur Lab, Chengdu 610041, Peoples R China
基金
中国国家自然科学基金;
关键词
Text Summarization; Sentence Vector; K -means Clustering; Word Embedding; TextRank Algorithm;
D O I
10.1016/j.knosys.2024.111447
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text summarization is to obtain a summary by compressing the text while retaining its important information. Then users can obtain the important content of the text by reading the summary. In the research literatures, the extraction summary method is widely used and is also one type of the main research methods of summary methods. However, this extraction summary method still has some problems. The selection of the initial cluster center has not been carefully determined, and the sentence redundancy summarized is high in articles with complex sentences. In order to solve the above problems, this paper proposes an automatic text summarization method based on improved TextRank algorithm and K -Means clustering. This method combines the improved BM25 model and the TextRank algorithm to calculate the BM25 similarity between sentences and obtain the TR scores of sentences. The TR scores are used to select the initial center of clustering based on similarity difference judgment and maximum judgment. The final summary is obtained by combining the cluster scores and sentence scores. The experimental results show that the proposed method in this paper has better evaluation indicators containing ROUGE -1, ROUGE -2 and ROUGE -L than other comparison algorithms including Lead -3, TextRank and MBM25EMB on the DUC2004 dataset. In conclusion, the proposed method in this paper improves the accuracy of automatic text summarization and reduce the redundancy from documents.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Improved K-means clustering algorithm based on user tag
    Tang J.
    Journal of Convergence Information Technology, 2010, 5 (10) : 124 - 130
  • [32] K-means Clustering Algorithm based on Improved Density Peak
    Wei, Debin
    Zhang, Zhenxing
    ACM International Conference Proceeding Series, 2023, : 105 - 109
  • [33] Video Classification Based On the Improved K-Means Clustering Algorithm
    Peng, Taile
    Zhang, Zhen
    Shen, Ke
    Jiang, Tao
    2019 5TH INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND MATERIAL APPLICATION, 2020, 440
  • [34] Load Forecasting Based on Improved K-means Clustering Algorithm
    Wang Yanbo
    Liu Li
    Pang Xinfu
    Fan Enpeng
    2018 CHINA INTERNATIONAL CONFERENCE ON ELECTRICITY DISTRIBUTION (CICED), 2018, : 2751 - 2755
  • [35] An Improved K-means Clustering Algorithm Based on Hadoop Platform
    Hou, Xiangru
    CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 1101 - 1109
  • [36] Research on Improved K-means Clustering Algorithm
    Zhang, Yinsheng
    Shan, Huilin
    Li, Jiaqiang
    Zhou, Jie
    MEMS, NANO AND SMART SYSTEMS, PTS 1-6, 2012, 403-408 : 1977 - 1980
  • [37] An Improved Kernel K-means Clustering Algorithm
    Liu, Yang
    Yin, Hong Peng
    Chai, Yi
    PROCEEDINGS OF 2016 CHINESE INTELLIGENT SYSTEMS CONFERENCE, VOL I, 2016, 404 : 275 - 280
  • [38] Research on improved K-means clustering algorithm
    Zhang, Yinsheng
    Shan, Huilin
    Li, Jiaqiang
    Zhou, Jie
    Advanced Materials Research, 2012, 403-408 : 1977 - 1980
  • [39] An Improved K-means Clustering Algorithm Based on Normal Matrix
    Tian Shengwen
    Zhao Yongsheng
    Wang Yilei
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON TEST AUTOMATION AND INSTRUMENTATION, VOL 4, 2008, : 2182 - 2185
  • [40] Improved K-means Algorithm Based on the Clustering Reliability Analysis
    Zhang, Hong
    Yu, Hong
    Li, Ying
    Hu, Baofang
    PROCEEDINGS OF THE 2015 INTERNATIONAL SYMPOSIUM ON COMPUTERS & INFORMATICS, 2015, 13 : 2516 - 2523