News Keyword Extraction Algorithm Based on Semantic Clustering and Word Graph Model

被引:3
|
作者
Ao Xiong [1 ]
Derong Liu [1 ]
Hongkang Tian [1 ]
Zhengyuan Liu [1 ]
Peng Yu [1 ]
Michel Kadoch [2 ]
机构
[1] State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications
[2] Ecole de Technologie Superieure,Universitedu Quebec
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
摘要
The internet is an abundant source of news every day. Thus, efficient algorithms to extract keywords from the text are important to obtain information quickly. However, the precision and recall of mature keyword extraction algorithms need improvement. TextRank, which is derived from the PageRank algorithm, uses word graphs to spread the weight of words. The keyword weight propagation in Text Rank focuses only on word frequency. To improve the performance of the algorithm, we propose Semantic Clustering TextRank(SCTR), a semantic clustering news keyword extraction algorithm based on TextRank. Firstly, the word vectors generated by the Bidirectional Encoder Representation from Transformers(BERT) model are used to perform k-means clustering to represent semantic clustering. Then, the clustering results are used to construct a TextRank weight transfer probability matrix. Finally,iterative calculation of word graphs and extraction of keywords are performed. The test target of this experiment is a Chinese news library. The results of the experiment conducted on this text set show that the SCTR algorithm has greater precision, recall, and F1 value than the traditional TextRank and Term Frequency-Inverse Document Frequency(TF-IDF) algorithms.
引用
收藏
页码:886 / 893
页数:8
相关论文
共 50 条
  • [31] A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction
    Hüma Kılıç
    Aydın Çetin
    Arabian Journal for Science and Engineering, 2023, 48 : 10673 - 10680
  • [32] A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction
    Kilic, Huma
    Cetin, Aydin
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10673 - 10680
  • [33] Information Extraction Model based on Semantic Role and Conceptual Graph
    Yang, Xuanxuan
    Zhang, Lei
    2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2009, : 386 - 389
  • [34] Semantic-based keyword extraction method for document
    Jiang, Fang
    Li, Guohe
    Yun, Xue
    Yue, Xiang
    International Journal of Future Generation Communication and Networking, 2015, 8 (05): : 37 - 46
  • [35] Research on keyword extraction of Tibetan web news based on improved TEXT-RANK algorithm
    Lan, Chuanqi
    Yu, Hongzhi
    Xu, Tao
    Liu, Peixin
    Li, Jiuyi
    PROCEEDINGS OF 2017 IEEE 2ND INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2017, : 208 - 212
  • [36] Incremental Patent Semantic Annotation Based on Keyword Extraction and List Extraction
    Chen, Xu
    Zong, Weixian
    Deng, Na
    Liu, Shudong
    Li, Yipeng
    COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2019), 2020, 993 : 91 - 101
  • [37] A Way to Improve Graph-Based Keyword Extraction
    Cao, Jian
    Jiang, Zhiheng
    Huang, May
    Wang, Karl
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2015, : 166 - 170
  • [38] Semantic-Distance Based Clustering for XML Keyword Search
    Yang, Weidong
    Zhu, Hao
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 : 398 - 409
  • [39] Information Retrieval Based on Word Semantic Clustering
    Chang, Chia-Yang
    Lin, Yan-Ting
    Lee, Shie-Jue
    Lai, Chih-Chin
    2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,
  • [40] Graph and Centroid-based Word Clustering
    Thaiprayoon, Santipong
    Unger, Herwig
    Kubek, Mario
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 163 - 168