A method for the construction of a probabilistic hierarchical structure based on a statistical analysis of a large-scale corpus

被引:1
|
作者
Terai, Asuka [1 ]
Bin Liu [2 ]
Nakagawa, Masanori [1 ]
机构
[1] Tokyo Inst Technol, Meguro Ku, Ookayama 2-12-1, Tokyo 152, Japan
[2] Nissay Informat Technol Co Ltd, Tokyo, Japan
关键词
D O I
10.1109/ICSC.2007.60
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The purpose of this study is to develop a method of constructing a probabilistic hierarchical structure based on a statistical analysis of a Japanese corpus using a combination of Kameya and Sato's statistical language analysis(7) and Rose's model(10). First, the co-occurrence frequencies of adjectives and nouns are calculated from a Japanese corpus based on modification relations. Second, latent classes are extracted from a statistical language analysis of the co-occurrence data. Third, the centroid vectors of the latent classes are calculated from the analysis results and a probabilistic hierarchical structure of the latent classes is constructed by utilizing Rose's model. Finally, the conditional probabilities of the categories given the latent classes are computed as the association probabilities of the concepts to the categories and the conditional probabilities of the categories given the concepts are computed as the association probabilities of the concepts to the categories.
引用
收藏
页码:129 / +
页数:2
相关论文
共 50 条