Vector Space Model Based on Lucene Index and TF-IDF Weighting Algorithm

被引:0
|
作者
Yang, Xiaodan [1 ]
Jia, Bo [1 ]
机构
[1] Zhejiang GongShang Univ, Informat & Elect Engineer Coll, Hangzhou, Zhejiang, Peoples R China
关键词
lucene index; TF-IDF weighting algorithm; document expression; vector space model;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
By studying the index structure of lucene, the opensource full text searching package, and researching vector space model for the document expression, a new method of implementing vector space model is put forward. The new method combines lucene index and TF-IDF weighting algorithm to implement vector space model for document expression. With this method, document expression does not need document itself, but the lucene index of the document. Programming for the lucene index is much easier than programming for the document itself. In the end, this article successfully implement vector space model for some html documents downloaded from NBA column of www.sina.com.cn.
引用
收藏
页码:20 / 23
页数:4
相关论文
共 4 条
  • [1] DING X, 2002, J ACAD LIB
  • [2] TAN H, 2007, LUCENE ACTION CHINES
  • [3] Wang Xuesong, 2008, LUCENE NUTCH SEARCH
  • [4] XUE YZ, 2008, THESIS U ELECT SCI T