lucene index;
TF-IDF weighting algorithm;
document expression;
vector space model;
D O I:
暂无
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
By studying the index structure of lucene, the opensource full text searching package, and researching vector space model for the document expression, a new method of implementing vector space model is put forward. The new method combines lucene index and TF-IDF weighting algorithm to implement vector space model for document expression. With this method, document expression does not need document itself, but the lucene index of the document. Programming for the lucene index is much easier than programming for the document itself. In the end, this article successfully implement vector space model for some html documents downloaded from NBA column of www.sina.com.cn.