HAKE: an Unsupervised Approach to Automatic Keyphrase Extraction for Multiple Domains

被引:4
|
作者
Merrouni, Zakariae Alami [1 ]
Frikh, Bouchra [1 ]
Ouhbi, Brahim [2 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Natl Sch Appl Sci ENSA, LIASSE Lab, BP 72,Route Dimouzer, Fes, Morocco
[2] Moulay Ismail Univ UMI, Natl Higher Sch Arts & Crafts ENSAM, Math Modeling & Comp Lab LM2I, Marjane 2,BP 4024, Meknes, Morocco
关键词
Automatic keyphrase extraction; Unsupervised machine learning; Feature selection; FEATURE-SELECTION; KEYWORD EXTRACTION; SYSTEM;
D O I
10.1007/s12559-021-09979-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrases capture the main content of a free text document. The task of automatic keyphrase extraction (AKPE) plays a significant role in retrieving and summarizing valuable information from several documents with different domains. Various techniques have been proposed for this task. However, supervised AKPE requires large annotated data and depends on the tested domain. An alternative solution is to consider a new independent domain method that can be applied to several domains (such as medical, social). In this paper, we tackle keyphrase extraction from single documents with HAKE, a novel unsupervised method that takes full advantage of mining linguistic, statistical, structural, and semantic text features simultaneously to select the most relevant keyphrases in a text. HAKE achieves higher F-scores than the unsupervised state-of-the-art systems on standard datasets and is suitable for real-time processing of large amounts of Web and text data across different domains. With HAKE, we also explicitly increase coverage and diversity among the selected keyphrases by introducing a novel technique (based on a parse tree approach, part of speech tagging, and filtering) for candidate keyphrase identification and extraction. This technique allows us to generate a comprehensive and meaningful list of candidate keyphrases and reduce the candidate set's size without increasing the computational complexity. HAKE's effectiveness is compared to twelve state-of-the-art and recent unsupervised approaches, as well as to some other supervised approaches. Experimental analysis is conducted to validate the proposed method using five of the top available benchmark corpora from different domains and shows that HAKE significantly outperforms both the existing unsupervised and supervised methods. Our method does not require training on a particular set of documents, nor does it depend on external corpora, dictionaries, domain, or text size. Our experiments confirm that HAKE's candidate selection model and its ranking model are effective.
引用
收藏
页码:852 / 874
页数:23
相关论文
共 50 条
  • [21] HCUKE: A Hierarchical Context-aware approach for Unsupervised Keyphrase Extraction
    Xu, Chun
    Mao, Xian-Ling
    Xin, Cheng-Xin
    Shang, Yu-Ming
    Che, Tian-Yi
    Mao, Hong-Li
    Huang, Heyan
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [22] PromptRank: Unsupervised Keyphrase Extraction Using Prompt
    Kong, Aobo
    Zhao, Shiwan
    Chen, Hao
    Li, Qicheng
    Qin, Yong
    Sun, Ruiqi
    Bai, Xiaoyan
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9788 - 9801
  • [23] How Preprocessing Affects Unsupervised Keyphrase Extraction
    Wang, Rui
    Liu, Wei
    McDonald, Chris
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PT I, 2014, 8403 : 163 - 176
  • [24] NamedKeys: Unsupervised Keyphrase Extraction for Biomedical Documents
    Gero, Zelalem
    Ho, Joyce C.
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 328 - 337
  • [25] Automatic keyphrase extraction: a survey and trends
    Merrouni, Zakariae Alami
    Frikh, Bouchra
    Ouhbi, Brahim
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2020, 54 (02) : 391 - 424
  • [26] Automatic tag recommendation approach with keyphrase extraction and word embedding techniques
    Konkaew, Taechawat
    Kitisin, Sukumal
    Journal of Computers (Taiwan), 2019, 30 (02) : 135 - 149
  • [27] Automatic Keyphrase Extraction Techniques: A Review
    Lim, Vicky Min-How
    Wong, Siew Fan
    Lim, Tong Ming
    2013 IEEE SYMPOSIUM ON COMPUTERS AND INFORMATICS (ISCI 2013), 2013,
  • [28] Automatic keyphrase extraction: a survey and trends
    Zakariae Alami Merrouni
    Bouchra Frikh
    Brahim Ouhbi
    Journal of Intelligent Information Systems, 2020, 54 : 391 - 424
  • [29] A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction
    Kumar, Niraj
    Srinathan, Kannan
    Varma, Vasudeva
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2016, 8 (02) : 124 - 143
  • [30] A Ranking Approach to Keyphrase Extraction
    Jiang, Xin
    Hu, Yunhua
    Li, Hang
    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 756 - 757