On a New Model for Automatic Text Categorization Based on Vector Space Model

被引:0
|
作者
Suzuki, Makoto [1 ]
Yamagishi, Naohide [1 ]
Ishidat, Takashi [2 ]
Gotot, Masayuki [2 ]
Hirasawa, Shigeichi [3 ]
机构
[1] Shonan Inst Technol, Fac Informat Sci, 1-1-25 Tsujido Nishikaigan, Kanagawa 2518511, Japan
[2] Waseda Univ, Shinjuku Ku, Tokyo 169, Japan
[3] Cyber Univ, Shinjuku Ku, Tokyo 162, Japan
关键词
text mining; classification; N-gram; newspaper;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In our previous paper, we proposed a new classification technique called the Frequency Ratio Accumulation Method (FRAM). This is a simple technique that adds up the ratios of term frequencies among categories, and it is able to use index terms without limit. Then, we adopted the Character N-gram to form index terms, thereby improving FRAM. However, FRAM did not have a satisfactory mathematical basis. Therefore, we present here a new mathematical model based on a "Vector Space Model" and consider its implications. The proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from the English Reuters-21578 data set, a Japanese CD-Mainichi 2002 data set using the proposed method. The Reuters-2I578 data set is a benchmark data set for automatic text categorization. It is shown that FRAM has good classification accuracy. Specifically, the micro-averaged F-measure of the proposed method is 92.2% for English. The proposed method can perform classification utilizing a single program and it is language-independent.
引用
收藏
页码:3152 / 3159
页数:8
相关论文
共 50 条
  • [31] A New Vector Space Model Based on the Deep Learning
    Karamti, Hanen
    Tmar, Mohamed
    Gargouri, Faiez
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT VI, 2017, 10639 : 750 - 758
  • [32] Coordinate Model for Text Categorization
    Jiang, Wei
    Chen, Lei
    TRANSACTIONS ON EDUTAINMENT V, 2011, 6530 : 214 - 223
  • [33] A method for automatic determination of the feature vector size for text categorization
    Fragoso, Rogerio C. P.
    Pinheiro, Roberto H. W.
    Cavalcanti, George D. C.
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 259 - 264
  • [34] A text categorization model based on Hidden Markov models
    Yi, K
    Beheshti, J
    CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2003, 27 (03): : 149 - 149
  • [35] Research of Text Categorization Model based on Random Forests
    Xue, Dashen
    Li, Fengxin
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION TECHNOLOGY CICT 2015, 2015, : 173 - 176
  • [36] A Concept-based Model for Enhancing Text Categorization
    Shehata, Shady
    Karray, Fakhri
    Kamel, Mohamed
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 629 - 637
  • [37] Research on text categorization model based on LDA - KNN
    Chen, Weihua
    Zhang, Xian
    2017 IEEE 2ND ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2017, : 2719 - 2726
  • [38] Plagiarism Detection in Text using Vector Space Model
    Ekbal, Asif
    Saha, Sriparna
    Choudhary, Gaurav
    2012 12TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2012, : 366 - 371
  • [39] Text representation combining syntax in vector space model
    Liu P.-Y.
    Yang Y.-Z.
    Zhao J.
    Advances in Information Sciences and Service Sciences, 2011, 3 (07): : 251 - 259
  • [40] Study on the Classification of Mixed Text Based on Conceptual Vector Space Model and Bayes
    Li, Yaxiong
    Hu, Dan
    2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 269 - 272