On a New Model for Automatic Text Categorization Based on Vector Space Model

被引:0
|
作者
Suzuki, Makoto [1 ]
Yamagishi, Naohide [1 ]
Ishidat, Takashi [2 ]
Gotot, Masayuki [2 ]
Hirasawa, Shigeichi [3 ]
机构
[1] Shonan Inst Technol, Fac Informat Sci, 1-1-25 Tsujido Nishikaigan, Kanagawa 2518511, Japan
[2] Waseda Univ, Shinjuku Ku, Tokyo 169, Japan
[3] Cyber Univ, Shinjuku Ku, Tokyo 162, Japan
关键词
text mining; classification; N-gram; newspaper;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In our previous paper, we proposed a new classification technique called the Frequency Ratio Accumulation Method (FRAM). This is a simple technique that adds up the ratios of term frequencies among categories, and it is able to use index terms without limit. Then, we adopted the Character N-gram to form index terms, thereby improving FRAM. However, FRAM did not have a satisfactory mathematical basis. Therefore, we present here a new mathematical model based on a "Vector Space Model" and consider its implications. The proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from the English Reuters-21578 data set, a Japanese CD-Mainichi 2002 data set using the proposed method. The Reuters-2I578 data set is a benchmark data set for automatic text categorization. It is shown that FRAM has good classification accuracy. Specifically, the micro-averaged F-measure of the proposed method is 92.2% for English. The proposed method can perform classification utilizing a single program and it is language-independent.
引用
收藏
页码:3152 / 3159
页数:8
相关论文
共 50 条
  • [1] On a new model for automatic text categorization based on vector space model
    Faculty of Information Science, Shonan Institute of Technology, 1-1-25 Tsujido Nishikaigan, Fujisawa, Kanagawa, 251-8511, Japan
    不详
    不详
    Conf. Proc. IEEE Int. Conf. Syst. Man Cybern., 2010, (3152-3159):
  • [2] Beyond TFIDF Weighting for Text Categorization in the Vector Space Model
    Soucy, Pascal
    Mineau, Guy W.
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 1130 - 1135
  • [3] Using kNN model for automatic text categorization
    Guo, GD
    Wang, H
    Bell, D
    Bi, YX
    Greer, K
    SOFT COMPUTING, 2006, 10 (05) : 423 - 430
  • [4] Using kNN model for automatic text categorization
    Gongde Guo
    Hui Wang
    David Bell
    Yaxin Bi
    Kieran Greer
    Soft Computing, 2006, 10 : 423 - 430
  • [5] Fast text categorization based on a novel class space model
    Gao, Yingfan
    Ma, Runbo
    Liu, Yushu
    MICAI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4293 : 1007 - +
  • [6] A Text Categorization Method using Extended Vector Space Model by Frequent Term Sets
    Yuan, Man
    Ouyang, Yuan Xin
    Xiong, Zhang
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2013, 29 (01) : 99 - 114
  • [7] Summarization of Text Clustering based Vector Space Model
    Chen, Mingzhen
    Song, Yu
    2009 IEEE 10TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED INDUSTRIAL DESIGN & CONCEPTUAL DESIGN, VOLS 1-3: E-BUSINESS, CREATIVE DESIGN, MANUFACTURING - CAID&CD'2009, 2009, : 2362 - 2365
  • [8] Automatic Text Summarization: A New Hybrid Model Based on Vector Space Modelling, Fuzzy Logic and Rhetorical Structure Analysis
    Ben Ayed, Alaidine
    Biskri, Ismail
    Meunier, Jean-Guy
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT II, 2019, 11684 : 26 - 34
  • [9] A new Centroid-Based Classification model for text categorization
    Liu, Chuan
    Wang, Wenyong
    Tu, Guanghui
    Xiang, Yu
    Wang, Siyang
    Lv, Fengmao
    KNOWLEDGE-BASED SYSTEMS, 2017, 136 : 15 - 26
  • [10] Text Categorization Based on Topic Model
    School of Computer Science and Technology, China University of Mining and Technology, Jiangsu Province, Xuzhou
    221116, China
    不详
    100081, China
    Int. J. Comput. Intell. Syst., 2009, 4 (398-409): : 398 - 409