HAKE: an Unsupervised Approach to Automatic Keyphrase Extraction for Multiple Domains

被引:4
|
作者
Merrouni, Zakariae Alami [1 ]
Frikh, Bouchra [1 ]
Ouhbi, Brahim [2 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Natl Sch Appl Sci ENSA, LIASSE Lab, BP 72,Route Dimouzer, Fes, Morocco
[2] Moulay Ismail Univ UMI, Natl Higher Sch Arts & Crafts ENSAM, Math Modeling & Comp Lab LM2I, Marjane 2,BP 4024, Meknes, Morocco
关键词
Automatic keyphrase extraction; Unsupervised machine learning; Feature selection; FEATURE-SELECTION; KEYWORD EXTRACTION; SYSTEM;
D O I
10.1007/s12559-021-09979-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrases capture the main content of a free text document. The task of automatic keyphrase extraction (AKPE) plays a significant role in retrieving and summarizing valuable information from several documents with different domains. Various techniques have been proposed for this task. However, supervised AKPE requires large annotated data and depends on the tested domain. An alternative solution is to consider a new independent domain method that can be applied to several domains (such as medical, social). In this paper, we tackle keyphrase extraction from single documents with HAKE, a novel unsupervised method that takes full advantage of mining linguistic, statistical, structural, and semantic text features simultaneously to select the most relevant keyphrases in a text. HAKE achieves higher F-scores than the unsupervised state-of-the-art systems on standard datasets and is suitable for real-time processing of large amounts of Web and text data across different domains. With HAKE, we also explicitly increase coverage and diversity among the selected keyphrases by introducing a novel technique (based on a parse tree approach, part of speech tagging, and filtering) for candidate keyphrase identification and extraction. This technique allows us to generate a comprehensive and meaningful list of candidate keyphrases and reduce the candidate set's size without increasing the computational complexity. HAKE's effectiveness is compared to twelve state-of-the-art and recent unsupervised approaches, as well as to some other supervised approaches. Experimental analysis is conducted to validate the proposed method using five of the top available benchmark corpora from different domains and shows that HAKE significantly outperforms both the existing unsupervised and supervised methods. Our method does not require training on a particular set of documents, nor does it depend on external corpora, dictionaries, domain, or text size. Our experiments confirm that HAKE's candidate selection model and its ranking model are effective.
引用
收藏
页码:852 / 874
页数:23
相关论文
共 50 条
  • [31] HyperRank: Hyperbolic Ranking Model for Unsupervised Keyphrase Extraction
    Song, Mingyang
    Liu, Huafeng
    Jing, Liping
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 16070 - 16080
  • [32] A New Scheme for Scoring Phrases in Unsupervised Keyphrase Extraction
    Florescu, Corina
    Caragea, Cornelia
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 477 - 483
  • [33] Unsupervised Keyphrase Extraction via Interpretable Neural Networks
    Joshi, Rishabh
    Balachandran, Vidhisha
    Saldanha, Emily
    Glenski, Maria
    Volkova, Svitlana
    Tsvetkov, Yulia
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1107 - 1119
  • [34] An automatic keyphrase extraction system for scientific documents
    Wei You
    Dominique Fontaine
    Jean-Paul Barthès
    Knowledge and Information Systems, 2013, 34 : 691 - 724
  • [35] Automatic keyphrase extraction using word embeddings
    Yuxiang Zhang
    Huan Liu
    Suge Wang
    W. H. Ip.
    Wei Fan
    Chunjing Xiao
    Soft Computing, 2020, 24 : 5593 - 5608
  • [36] Automatic keyphrase extraction using word embeddings
    Zhang, Yuxiang
    Liu, Huan
    Wang, Suge
    Ip, W. H.
    Fan, Wei
    Xiao, Chunjing
    SOFT COMPUTING, 2020, 24 (08) : 5593 - 5608
  • [37] Automatic Keyphrase Extraction with a Refined Candidate Set
    You, Wei
    Fontaine, Dominique
    Barthes, Jean-Paul
    2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2009, : 576 - 579
  • [38] Automatic keyphrase extraction from scientific articles
    Su Nam Kim
    Olena Medelyan
    Min-Yen Kan
    Timothy Baldwin
    Language Resources and Evaluation, 2013, 47 : 723 - 742
  • [39] An automatic keyphrase extraction system for scientific documents
    You, Wei
    Fontaine, Dominique
    Barthes, Jean-Paul
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) : 691 - 724
  • [40] Automatic keyphrase extraction from Chinese books
    Chen, Yijiang
    Shi, Xiaodong
    Zhou, Changle
    Su, Chang
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 92 - +