A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction

被引:1
|
作者
Kumar, Niraj [1 ]
Srinathan, Kannan [1 ]
Varma, Vasudeva [1 ]
机构
[1] IIIT Hyderabad, Hyderabad 500032, Andhra Pradesh, India
关键词
keyphrase extraction; weighted betweenness centrality; N-gram graph; normalised pointwise mutual information; NPMI;
D O I
10.1504/IJDMMM.2016.077198
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a novel N-gram (N>=1) filtration technique for keyphrase extraction. To filter the sophisticated candidate keyphrases (N-grams), we introduce the combined use of: 1) statistical feature (obtained by using weighted betweenness centrality scores of words, which is generally used to identify the border nodes/edges in community detection techniques); 2) co-location strength (calculated by using nearest neighbour Dbpedia texts). We also introduce the use of N-gram (N>=1) graph, which reduces the bias effect of lower length N-grams in the ranking process and preserves the semantics of words (phraseness), based upon local context. To capture the theme of the document and to reduce the effect of noisy terms in the ranking process, we apply an information theoretic framework for key-player detection on the proposed N-gram graph. Our experimental results show that the devised system performs better than the current state-of-the-art unsupervised systems and comparable/better than supervised systems.
引用
收藏
页码:124 / 143
页数:20
相关论文
共 50 条
  • [1] Automatic Keyphrase Extraction from Scientific Documents Using N-gram Filtration Technique
    Kumar, Niraj
    Srinathan, Kannan
    DOCENG'08: PROCEEDINGS OF THE EIGHTH ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2008, : 199 - 208
  • [2] A Graph-based Approach of Automatic Keyphrase Extraction
    Yan Ying
    Tan Qingping
    Xie Qinzheng
    Zeng Ping
    Li Panpan
    ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2017, 107 : 248 - 255
  • [3] An N-Gram Based Method for Bengali Keyphrase Extraction
    Sarkar, Kamal
    INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 36 - 41
  • [4] Automatic Keyphrase Extraction using Graph-based Methods
    Mothe, Josiane
    Ramiandrisoa, Faneva
    Rasolomanana, Michael
    33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 728 - 730
  • [5] ISKE: An unsupervised automatic keyphrase extraction approach using the iterated sentences based on graph method
    Chi, Ling
    Hu, Liang
    KNOWLEDGE-BASED SYSTEMS, 2021, 223
  • [6] Graph-based Keyphrase Extraction Using Word and Document Embeddings
    Zu, Xian
    Xie, Fei
    Liu, Xiaojian
    11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 70 - 76
  • [7] N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules
    Liu, Shengchao
    Demirel, Mehmet Furkan
    Liang, Yingyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [8] KEST: A graph-based keyphrase extraction technique for tweets summarization using Markov Decision Process
    Garg, Muskan
    Kumar, Mukesh
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 209
  • [9] TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique
    Rabby, Gollam
    Azad, Saiful
    Mahmud, Mufti
    Zamli, Kamal Z.
    Rahman, Mohammed Mostafizur
    COGNITIVE COMPUTATION, 2020, 12 (04) : 811 - 833
  • [10] TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique
    Gollam Rabby
    Saiful Azad
    Mufti Mahmud
    Kamal Z. Zamli
    Mohammed Mostafizur Rahman
    Cognitive Computation, 2020, 12 : 811 - 833