Towards unsupervised keyphrase extraction via an autoregressive approach

被引:1
|
作者
Li, Tuohang [1 ]
Hu, Liang [1 ]
Li, Hongtu [1 ]
Sun, Chengyu [1 ]
Li, Shuai [1 ]
Chi, Ling [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, 2699 Qianjin St, Changchun 130012, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
Keyphrase extraction; Autoregressive structure; Optimizer; Unsupervised model; Coverage decay optimizer;
D O I
10.1016/j.knosys.2023.110664
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrase extraction is a technique used to capture the core information of documents and is an upstream task for advanced information retrieval systems, particularly in the academic realm. Current unsupervised methods are primarily built on a score-and-rank framework with a consistent inability to acquire mutual information between extracted keyphrases, especially with graph-based models. Utilizing the autoregressive structure that is typically used in sequence-to-sequence text generation models, we propose a plug-and-play optimizer named C-Decay that can be integrated into any graph -based unsupervised keyphrase extraction model for a stable performance boost, and that mitigates the bias of certain semantically or lexically dominant tokens by optimizing the origin score distribution output by graph-based models directly. The architecture of C-Decay includes the keyphrase pool, the gain vector and the decay factor, where the keyphrase pool is designed to realize an autoregressive structure and the gain vector and the decay factor are the optimization operator. Herein, we examine three graph-based models integrated with C-Decay, and the experiment is conducted on four datasets KDD, Semeval, Nguyen, and Krapivin. Moreover, we prove that C-Decay can improve accuracy and F-Measure by an average of approximately 50% and 20%, respectively.& COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique
    Rabby, Gollam
    Azad, Saiful
    Mahmud, Mufti
    Zamli, Kamal Z.
    Rahman, Mohammed Mostafizur
    COGNITIVE COMPUTATION, 2020, 12 (04) : 811 - 833
  • [32] AttentionRank: Unsupervised keyphrase Extraction using Self and Cross Attentions
    Ding, Haoran
    Luo, Xiao
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1919 - 1928
  • [33] Improving Diversity in Unsupervised Keyphrase Extraction with Determinantal Point Process
    Song, Mingyang
    Liu, Huafeng
    Jing, Liping
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4294 - 4299
  • [34] SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation
    Alrehamy, Hassan H.
    Walker, Coral
    ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 650 : 222 - 235
  • [35] TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique
    Gollam Rabby
    Saiful Azad
    Mufti Mahmud
    Kamal Z. Zamli
    Mohammed Mostafizur Rahman
    Cognitive Computation, 2020, 12 : 811 - 833
  • [36] Unsupervised Keyphrase Extraction by Jointly Modeling Local and Global Context
    Liang, Xinnian
    Wu, Shuangzhi
    Li, Mu
    Li, Zhoujun
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 155 - 164
  • [37] An unsupervised keyphrase extraction model by incorporating structural and semantic information
    Linkai Luo
    Longmin Zhang
    Hong Peng
    Progress in Artificial Intelligence, 2020, 9 : 77 - 83
  • [38] An unsupervised keyphrase extraction model by incorporating structural and semantic information
    Luo, Linkai
    Zhang, Longmin
    Peng, Hong
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2020, 9 (01) : 77 - 83
  • [39] An Improved Approach to Bengali Keyphrase Extraction
    Sarkar, Kamal
    2014 FOURTH INTERNATIONAL CONFERENCE OF EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2014, : 283 - 288
  • [40] Prioritization of COVID-19-Related Literature via Unsupervised Keyphrase Extraction and Document Representation Learning
    Skrlj, Blaz
    Jukic, Marko
    Erzen, Nika
    Pollak, Senja
    Lavrac, Nada
    DISCOVERY SCIENCE (DS 2021), 2021, 12986 : 204 - 217