News Keyword Extraction Algorithm Based on Semantic Clustering and Word Graph Model

被引:3
|
作者
Ao Xiong [1 ]
Derong Liu [1 ]
Hongkang Tian [1 ]
Zhengyuan Liu [1 ]
Peng Yu [1 ]
Michel Kadoch [2 ]
机构
[1] State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications
[2] Ecole de Technologie Superieure,Universitedu Quebec
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
摘要
The internet is an abundant source of news every day. Thus, efficient algorithms to extract keywords from the text are important to obtain information quickly. However, the precision and recall of mature keyword extraction algorithms need improvement. TextRank, which is derived from the PageRank algorithm, uses word graphs to spread the weight of words. The keyword weight propagation in Text Rank focuses only on word frequency. To improve the performance of the algorithm, we propose Semantic Clustering TextRank(SCTR), a semantic clustering news keyword extraction algorithm based on TextRank. Firstly, the word vectors generated by the Bidirectional Encoder Representation from Transformers(BERT) model are used to perform k-means clustering to represent semantic clustering. Then, the clustering results are used to construct a TextRank weight transfer probability matrix. Finally,iterative calculation of word graphs and extraction of keywords are performed. The test target of this experiment is a Chinese news library. The results of the experiment conducted on this text set show that the SCTR algorithm has greater precision, recall, and F1 value than the traditional TextRank and Term Frequency-Inverse Document Frequency(TF-IDF) algorithms.
引用
收藏
页码:886 / 893
页数:8
相关论文
共 50 条
  • [21] RETRACTED: TextRank Keyword Extraction Algorithm Using Word Vector Clustering Based on Rough Data-Deduction (Retracted Article)
    Zhou, Ning
    Shi, Wenqian
    Liang, Renyu
    Zhong, Na
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [22] A Semantic Keyword Search Based on the Bidirectional Fix Root Query Graph Construction Algorithm
    Sitthisarn, Siraya
    SEMANTIC TECHNOLOGY (JIST 2014), 2015, 8943 : 387 - 394
  • [23] A graph based keyword extraction model using collective node weight
    Biswas, Saroj Kr.
    Bordoloi, Monali
    Shreya, Jacob
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 97 : 51 - 59
  • [24] Chinese keyword extraction based on word platform
    Jiao, Hui
    Liu, Qian
    Jia, Hui-bo
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 360 - +
  • [25] Abductive reasoning for keyword recovering in semantic-based keyword extraction
    Kongkachandra, Rachada
    Chamnongthai, Kosin
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2008, : 714 - +
  • [26] Semantic Navigation of Keyword Search Based on Knowledge Graph
    Peng, Bo
    Chen, Guohua
    Tang, Yong
    Sun, Saimei
    Sun, Yuxia
    12TH CHINESE CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CHINESECSCW 2017), 2017, : 189 - 192
  • [27] A Four-Feature Keyword Extraction Algorithm Based on Word Length Priority ratio
    Kang, Hui
    Lu, Lingfeng
    Su, Hang
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1492 - 1498
  • [28] SIFRANK Algorithm for Chinese Text Keyword Extraction Based on Dependent Semantic Feature Constraints
    Zhang, Qian
    Wang, Tiancheng
    Zhu, Mengyuan
    Shen, Tao
    Zhao, Yilin
    Zhang, Yunwei
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 1652 - 1657
  • [29] Medical image clustering algorithm based on graph model
    Pan, Hai-Wei
    Gu, Jing-Zi
    Han, Qi-Long
    Xie, Xiao-Qin
    Zhang, Zhi-Qiang
    Rong, Jing-Shi
    Ruan Jian Xue Bao/Journal of Software, 2013, 24 (SUPPL.2): : 178 - 187
  • [30] An Improved Keyword Extraction Method Using Graph Based Random Walk Model
    Islam, Md. Rafiqul
    Islam, Md. Rakibul
    2008 11TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY: ICCIT 2008, VOLS 1 AND 2, 2008, : 256 - 260