Domain-independent automatic keyphrase indexing with small training sets

被引:55
|
作者
Medelyan, Ena [1 ]
Witten, Ian H. [1 ]
机构
[1] Univ Waikato, Dept Comp Sci, Hamilton 3240, New Zealand
关键词
D O I
10.1002/asi.20790
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.
引用
收藏
页码:1026 / 1040
页数:15
相关论文
共 50 条
  • [41] Domain-Specific and Domain-Independent Interactive Behaviors in Andes
    Chi, Min
    Vanlehn, Kurt
    ARTIFICIAL INTELLIGENCE IN EDUCATION: BUILDING TECHNOLOGY RICH LEARNING CONTEXTS THAT WORK, 2007, 158 : 548 - +
  • [42] Unsupervised Learning of Domain-Independent User Attributes
    Ishikawa, Yuichi
    Legaspi, Roberto
    Yonekawa, Kei
    Nakamura, Yugo
    Ishida, Shigemi
    Mine, Tsunenori
    Arakawa, Yutaka
    IEEE ACCESS, 2022, 10 : 119649 - 119665
  • [43] Towards a Domain-Independent ITS Middleware Architecture
    Gross, Sebastian
    Mokbel, Bassam
    Hammer, Barbara
    Pinkwart, Niels
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON ADVANCED LEARNING TECHNOLOGIES (ICALT 2013), 2013, : 408 - +
  • [44] Confluence in Domain-Independent Product Line Transformations
    Oldevik, Jon
    Haugen, Oystein
    Moller-Pedersen, Birger
    FUNDAMENTAL APPROACHES TO SOFTWARE ENGINEERING, PROCEEDINGS, 2009, 5503 : 34 - 48
  • [45] An ontological approach for developing domain-independent groupware
    Gallardo, Jesus
    Bravo, Crescencio
    Redondo, Miguel A.
    WET ICE 2007: 16TH IEEE INTERNATIONAL WORKSHOPS ON ENABLING TECHNOLOGIES: INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES, PROCEEDINGS, 2007, : 206 - 207
  • [46] Domain-independent queries on databases with external functions
    Suciu, D
    THEORETICAL COMPUTER SCIENCE, 1998, 190 (02) : 279 - 315
  • [47] Domain-Independent Classification for Deep Web Interfaces
    Li, Yingjun
    Wang, Siwei
    Shen, Derong
    Nie, Tiezheng
    Yu, Ge
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2010, 6184 : 453 - 458
  • [48] Dimensionality and generalizability of domain-independent performance assessments
    Baker, EL
    Abedi, J
    Linn, RL
    Niemi, D
    JOURNAL OF EDUCATIONAL RESEARCH, 1996, 89 (04): : 197 - 205
  • [49] DOMAIN-INDEPENDENT FRAMEWORK FOR LEARNING PROCEDURES.
    Langley, Pat
    Neches, Robert
    Neves, David
    Anzai, Yuichiro
    International Journal of Policy Analysis and Information Systems, 1980, 4 (02): : 163 - 197
  • [50] Inductive Certificates of Unsolvability for Domain-Independent Planning
    Eriksson, Salome
    Roeger, Gabriele
    Helmert, Malte
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 5244 - 5248