Automatic Text Summarization Method Based on Improved TextRank Algorithm and K-Means Clustering

被引:7
|
作者
Liu, Wenjun [1 ,2 ]
Sun, Yuyan [2 ]
Yu, Bao [2 ]
Wang, Hailan [2 ]
Peng, Qingcheng [2 ]
Hou, Mengshu [1 ,3 ]
Guo, Huan [2 ]
Wang, Hai [2 ]
Liu, Cheng [1 ,4 ]
机构
[1] Univ Elect Sci & Technol China UESTC, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[2] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China
[3] Chengdu Technol Univ, Sch Big Data & Artificial Intelligence, Chengdu 611730, Peoples R China
[4] 30th Res Inst China Elect Technol Grp Corp, Sci & Technol Commun Secur Lab, Chengdu 610041, Peoples R China
基金
中国国家自然科学基金;
关键词
Text Summarization; Sentence Vector; K -means Clustering; Word Embedding; TextRank Algorithm;
D O I
10.1016/j.knosys.2024.111447
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text summarization is to obtain a summary by compressing the text while retaining its important information. Then users can obtain the important content of the text by reading the summary. In the research literatures, the extraction summary method is widely used and is also one type of the main research methods of summary methods. However, this extraction summary method still has some problems. The selection of the initial cluster center has not been carefully determined, and the sentence redundancy summarized is high in articles with complex sentences. In order to solve the above problems, this paper proposes an automatic text summarization method based on improved TextRank algorithm and K -Means clustering. This method combines the improved BM25 model and the TextRank algorithm to calculate the BM25 similarity between sentences and obtain the TR scores of sentences. The TR scores are used to select the initial center of clustering based on similarity difference judgment and maximum judgment. The final summary is obtained by combining the cluster scores and sentence scores. The experimental results show that the proposed method in this paper has better evaluation indicators containing ROUGE -1, ROUGE -2 and ROUGE -L than other comparison algorithms including Lead -3, TextRank and MBM25EMB on the DUC2004 dataset. In conclusion, the proposed method in this paper improves the accuracy of automatic text summarization and reduce the redundancy from documents.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Automatic Extractive Text Summarization using K-Means Clustering
    Shetty, Krithi
    Kallimani, Jagadish S.
    2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER, AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2017, : 881 - 890
  • [2] An Improved Method Based on the Density and K-means Nearest Neighbor Text Clustering Algorithm
    Fan, Xiaojing
    Jiang, Mingyang
    Pei, Zhili
    Qiao, Shicheng
    Lian, Jie
    Wang, Chaoyong
    2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR EDUCATION (ICTE 2015), 2015, : 312 - 315
  • [3] Improved K-Means algorithm in text semantic clustering
    Ma, Junhong
    Open Cybernetics and Systemics Journal, 2014, 8 : 530 - 534
  • [4] An improved K-Means text clustering algorithm based on Local Search
    Liu, Xiangwei
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 11578 - 11581
  • [5] An Improved K-Means Clustering Algorithm Based on Spectral Method
    Tian, Shengwen
    Yang, Hongyong
    Wang, Yilei
    Li, Ali
    ADVANCES IN COMPUTATION AND INTELLIGENCE, PROCEEDINGS, 2008, 5370 : 530 - 536
  • [6] Automatic Text Summarization Using Gensim Word2Vec and K-Means Clustering Algorithm
    Haider, Mofiz Mojib
    Hossin, Md Arman
    Mahi, Hasibur Rashid
    Arif, Hossain
    2020 IEEE REGION 10 SYMPOSIUM (TENSYMP) - TECHNOLOGY FOR IMPACTFUL SUSTAINABLE DEVELOPMENT, 2020, : 283 - 286
  • [7] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
  • [8] A new Chinese text clustering algorithm based on WRD and improved K-means
    Cui, Zicai
    Zhong, Bocheng
    Bai, Chen
    INTELLIGENT DATA ANALYSIS, 2023, 27 (04) : 1205 - 1220
  • [9] Chinese text clustering algorithm based k-means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    2012 INTERNATIONAL CONFERENCE ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING (ICMPBE2012), 2012, 33 : 301 - 307
  • [10] Chinese Text Clustering Algorithm Based K-Means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    2011 AASRI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRY APPLICATION (AASRI-AIIA 2011), VOL 1, 2011, : 90 - 93