采用术语定义模式和多特征的新术语及定义识别方法

被引：12

作者：

荀恩东

李晟

机构：

[1] 北京语言大学语言信息处理研究所

来源：

计算机研究与发展 | 2009年 / 01期

关键词：

信息抽取; 术语定义模式; 统计语言学模型; 支持向量机; 句子隶属度;

D O I：

暂无

中图分类号：

TP391.41 [];

学科分类号：

080203 ;

摘要：

新术语及其定义抽取是信息抽取的重要研究内容之一.研究结果表明,在科技文献中,一个新术语往往伴随其定义出现,通过考察,在真实文本中,术语定义存在显著的语言表述特征,从大规模真实语料库中,通过考察术语定义构成的语言学模式、定义中词汇和术语周边的统计特征,提出了以术语定义的语言学模式(LPTD)作为待识别候选新术语集,同时考虑到有关新术语出现的上下文统计特征,用SVM分类器方法完成科技语料中新术语及其定义的识别.在大规模科技期刊上进行方法验证,开放性评测结果的精确率为90.5%、召回率达78.1%.

引用

页码：62 / 69

页数：8

共 4 条

[1] 汉语术语定义的结构分析和提取 [J].

张艳 ;

宗成庆 ;

徐波 .

中文信息学报, 2003, (06) :9-16

[2]

Automatic recognition of multi-word terms:. the C-value/NC-value method[J] . Katerina Frantzi,Sophia Ananiadou,Hideki Mima.International Journal on Digital Libraries . 2000 (2)

[3]

Technical terminology: some linguistic properties and an algorithm for identification in text[J] . John S. Justeson,Slava M. Katz.Natural Language Engineering . 1995 (1)

[4]

Paradigmatic modifiability statistics for the extraction of complex multi-word terms .2 Wermter J,Hahn U. Proc of the5th Human Language Technology Conference and2005Conf on Empirical Methods in Natural Language Processing . 2005

← 1 →