领域本体术语的抽取方法研究

被引:8
|
作者
张雷瀚 [1 ]
吕学强 [1 ]
李卓 [1 ]
徐丽萍 [2 ]
机构
[1] 北京信息科技大学网络文化与数字传播北京市重点实验室
[2] 北京城市系统工程研究中心
基金
北京市自然科学基金;
关键词
本体构建; 术语抽取; 逆向词性规则; 参照语料; 术语领域度;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
摘要
领域术语是本体构建的基本要素,自动获取高质量的领域术语是构建领域本体的基础。本文提出一种多策略融合的领域术语抽取方法。分析领域术语的语法结构及统计特征,构造术语抽取的逆向词性规则和领域专用停用词表;利用PATTree术语抽取模型和C-value方法获取候选术语;借鉴TF-IDF及参照语料对比思想,从单个文档和领域文档集两个层面计算术语领域度,并依据术语领域度的大小筛选得到领域术语。在经济类语料上的实验结果显示:领域术语的top-100、top-500和top-1500准确率分别达到了94.00%、85.20%和78.47%,与baseline相比,分别提高5%、4.8%和6.2%。
引用
收藏
页码:167 / 174
页数:8
相关论文
共 19 条
  • [1] Towards modernised and webspecific stoplists for web document analysis. Sinka M P,Corne D W. Web Intelligence,2003.WI 2003.Proceedings.IEEE/WIC International Conference on . 2003
  • [2] Extraction of specific nursing terms using corpora comparison. Jiang G,Sato H,Endoh A,et al. AMIA Annual Symposium Proceedings . 2005
  • [3] New indices for text: Pat trees and Pat Arrays. Gonnet G,Baeza-Yates R,Snider T. Information Retrieval: Data Structures and Algorithms . 1992
  • [4] Extraction, selection and ranking of Field Association (FA) Terms from domain-specific corpora for building a comprehensive FA terms dictionary. Tshering Cigay Dorji,El-sayed Atlam,Susumu Yata,Masao Fuketa,Kazuhiro Morita,Jun-ichi Aoe. Knowledge and Information Systems . 2011
  • [5] Translation approach to portable ontology specifications. Gruber TR. Knowledge Acquisition . 1993
  • [6] A corpus comparison approach for termi-nology extraction. Teresa Mihwa Chung. Terminology . 2003
  • [7] Experimental Evaluation of Ranking and Selection Methods in Term Extraction. Hiroshi Nakagawa. Recent Advances in Computational Terminology . 2001
  • [8] Using machine learning to perform automatic term recognition. Foo J,Merkel M. Proceedings of the LREC 2010 Workshop on Methods for automatic acquisition of Language Resources and their evaluation methods . 2010
  • [9] 基于改进C-value方法的中文术语抽取
    胡阿沛
    张静
    刘俊丽
    [J]. 现代图书情报技术, 2013, (02) : 24 - 29
  • [10] Ontology learning from text;methods,evaluation and applications. . 2005