Domain-independent automatic keyphrase indexing with small training sets

被引:55
|
作者
Medelyan, Ena [1 ]
Witten, Ian H. [1 ]
机构
[1] Univ Waikato, Dept Comp Sci, Hamilton 3240, New Zealand
关键词
D O I
10.1002/asi.20790
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.
引用
收藏
页码:1026 / 1040
页数:15
相关论文
共 50 条
  • [1] HIGHLIGHTS - LANGUAGE-INDEPENDENT AND DOMAIN-INDEPENDENT AUTOMATIC-INDEXING TERMS FOR ABSTRACTING
    COHEN, JD
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1995, 46 (03): : 162 - 174
  • [2] Domain-Independent, Automatic Partitioning for Probabilistic Planning
    Dai, Peng
    Mausam
    Weld, Daniel S.
    21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1677 - 1683
  • [3] Automatic abstracting in domain-independent Chinese documents
    Guo, Yuqing
    Wan, Min
    Luo, Zhensheng
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2002, 42 (01): : 139 - 142
  • [4] Thesaurus based automatic keyphrase indexing
    Medelyan, Olena
    Witten, Ian H.
    OPENING INFORMATION HORIZONS, 2006, : 296 - +
  • [5] Automatic Construction of a Semantic, Domain-Independent Knowledge Base
    Urbansky, David
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2009 WORKSHOPS, 2009, 5872 : 800 - 804
  • [6] A Domain-Independent Hybrid Approach for Automatic Taxonomy Induction
    Zafar, Bushra
    Qamar, Usman
    Imran, Ayesha
    2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : 372 - 375
  • [7] Carbon: Domain-Independent Automatic Web Form Filling
    Araujo, Samur
    Gao, Qi
    Leonardi, Erwin
    Houben, Geert-Jan
    WEB ENGINEERING, 2010, 6189 : 292 - 306
  • [8] Highlights: Language- and domain-independent automatic indexing terms for abstracting (vol 46, pg 162, 1995)
    Cohen, JD
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1996, 47 (03): : 260 - 260
  • [9] Usability study for domain-independent clustering of large document sets
    Devooght, I
    Gnasa, M
    Harbusch, K
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XI, PROCEEDINGS: COMPUTER SCIENCE II, 2002, : 483 - 488
  • [10] Semantic Levels of Domain-Independent Commonsense Knowledgebase for Visual Indexing and Retrieval Applications
    Altadmri, Amjad
    Ahmed, Amr
    Mohtasseb, Haytham
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 640 - 647