News Keyword Extraction Algorithm Based on Semantic Clustering and Word Graph Model

被引:3
|
作者
Ao Xiong [1 ]
Derong Liu [1 ]
Hongkang Tian [1 ]
Zhengyuan Liu [1 ]
Peng Yu [1 ]
Michel Kadoch [2 ]
机构
[1] State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications
[2] Ecole de Technologie Superieure,Universitedu Quebec
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
摘要
The internet is an abundant source of news every day. Thus, efficient algorithms to extract keywords from the text are important to obtain information quickly. However, the precision and recall of mature keyword extraction algorithms need improvement. TextRank, which is derived from the PageRank algorithm, uses word graphs to spread the weight of words. The keyword weight propagation in Text Rank focuses only on word frequency. To improve the performance of the algorithm, we propose Semantic Clustering TextRank(SCTR), a semantic clustering news keyword extraction algorithm based on TextRank. Firstly, the word vectors generated by the Bidirectional Encoder Representation from Transformers(BERT) model are used to perform k-means clustering to represent semantic clustering. Then, the clustering results are used to construct a TextRank weight transfer probability matrix. Finally,iterative calculation of word graphs and extraction of keywords are performed. The test target of this experiment is a Chinese news library. The results of the experiment conducted on this text set show that the SCTR algorithm has greater precision, recall, and F1 value than the traditional TextRank and Term Frequency-Inverse Document Frequency(TF-IDF) algorithms.
引用
收藏
页码:886 / 893
页数:8
相关论文
共 50 条
  • [41] KEYWORD EXTRACTION BASED ON WORD SYNONYMS USING WORD2VEC
    Ogul, Iskender Ulgen
    Ozcan, Caner
    Hakdagli, Ozlem
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [42] Analysis and Implementation of Graph Clustering for Digital News Using Star Clustering Algorithm
    Ahdi, A. B.
    Sw, K. R.
    Herdiani, A.
    1ST INTERNATIONAL CONFERENCE ON COMPUTING AND APPLIED INFORMATICS 2016 : APPLIED INFORMATICS TOWARD SMART ENVIRONMENT, PEOPLE, AND SOCIETY, 2017, 801
  • [43] The Model of Keyword Information Retrieval Based on Semantic
    Dong, Wanxin
    Li, Xin
    Zhou, Sheng
    2010 INTERNATIONAL CONFERENCE ON INFORMATION, ELECTRONIC AND COMPUTER SCIENCE, VOLS 1-3, 2010, : 662 - +
  • [44] Hyponymy Graph Model for Word Semantic Similarity Measurement
    Wang Junhua
    Zuo Wanli
    Peng Tao
    CHINESE JOURNAL OF ELECTRONICS, 2015, 24 (01) : 96 - 101
  • [45] Hyponymy Graph Model for Word Semantic Similarity Measurement
    WANG Junhua
    ZUO Wanli
    PENG Tao
    Chinese Journal of Electronics, 2015, 24 (01) : 96 - 101
  • [46] Keyword search algorithm of large graph based on GPU
    Lin H.-X.
    Qiao L.-P.
    Yuan Y.
    Wang G.-R.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2022, 56 (02): : 271 - 279
  • [47] A keyword-based semantic prefetching approach in Internet news services
    Xu, CZ
    Ibrahim, TI
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (05) : 601 - 611
  • [48] Digital News Graph Clustering using Chinese Whispers Algorithm
    Pratama, M. F. E.
    Kemas, R. S. W.
    Anisa, H.
    1ST INTERNATIONAL CONFERENCE ON COMPUTING AND APPLIED INFORMATICS 2016 : APPLIED INFORMATICS TOWARD SMART ENVIRONMENT, PEOPLE, AND SOCIETY, 2017, 801
  • [49] Implementation of MCL Algorithm in Clustering Digital News with Graph Representation
    Al-Fath, Alwan M. Ubaidillah
    Saleh, Kemas Rahmat W.
    Siti Sa'adah, M. T.
    2016 4TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2016,
  • [50] A Modified Approach to Keyword Extraction Based on Word-similarity
    Meng Wenchao
    Liu Lianchen
    Dai Ting
    2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 3, 2009, : 388 - 392