On a New Model for Automatic Text Categorization Based on Vector Space Model

被引:0
|
作者
Suzuki, Makoto [1 ]
Yamagishi, Naohide [1 ]
Ishidat, Takashi [2 ]
Gotot, Masayuki [2 ]
Hirasawa, Shigeichi [3 ]
机构
[1] Shonan Inst Technol, Fac Informat Sci, 1-1-25 Tsujido Nishikaigan, Kanagawa 2518511, Japan
[2] Waseda Univ, Shinjuku Ku, Tokyo 169, Japan
[3] Cyber Univ, Shinjuku Ku, Tokyo 162, Japan
关键词
text mining; classification; N-gram; newspaper;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In our previous paper, we proposed a new classification technique called the Frequency Ratio Accumulation Method (FRAM). This is a simple technique that adds up the ratios of term frequencies among categories, and it is able to use index terms without limit. Then, we adopted the Character N-gram to form index terms, thereby improving FRAM. However, FRAM did not have a satisfactory mathematical basis. Therefore, we present here a new mathematical model based on a "Vector Space Model" and consider its implications. The proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from the English Reuters-21578 data set, a Japanese CD-Mainichi 2002 data set using the proposed method. The Reuters-2I578 data set is a benchmark data set for automatic text categorization. It is shown that FRAM has good classification accuracy. Specifically, the micro-averaged F-measure of the proposed method is 92.2% for English. The proposed method can perform classification utilizing a single program and it is language-independent.
引用
收藏
页码:3152 / 3159
页数:8
相关论文
共 50 条
  • [21] A Vector Space Model based Education Resources Automatic Classifier
    Xia, Tian
    2014 SECOND INTERNATIONAL CONFERENCE ON ENTERPRISE SYSTEMS (ES), 2014, : 323 - 326
  • [22] Conceptual Persian Text Summarizer: A New Model in Continuous Vector Space
    Khademi, Mohammad Ebrahim
    Fakhredanesh, Mohammad
    Hoseini, Seyed Mojtaba
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2020, 17 (04) : 529 - 538
  • [23] A comparative study using vector space model with K-nearest neighbor on text categorization data
    Hadi, Wa'el Musa
    Thabtah, Fadi
    Abdel-jaber, Hussein
    WORLD CONGRESS ON ENGINEERING 2007, VOLS 1 AND 2, 2007, : 296 - +
  • [24] Automatic text categorization with discrete kernel-based support vector machine
    Fu, Peng
    Zhang, Deyun
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2005, 45 (SUPPL.): : 1778 - 1782
  • [25] Research on Ontology-Based Text Representation of Vector Space Model
    Wei, Guiying
    Bao, Mingming
    Wu, Sen
    2010 2ND INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS PROCEEDINGS (DBTA), 2010,
  • [26] Vector Space Model of Text Classification Based on Inertia Contribution of Document
    Kande, Demba
    Camara, Fode
    Marone, Reine Marie
    Ndiaye, Samba
    EMERGING TECHNOLOGIES FOR DEVELOPING COUNTRIES, 2019, 260 : 155 - 165
  • [27] A Chinese text classification model based on vector space and semantic meaning
    Wang, BY
    Zhang, SM
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1141 - 1145
  • [28] Sentence alignment for web page text based on vector space model
    Zhang, Guan-Hong
    Odbal
    International Journal of Digital Content Technology and its Applications, 2012, 6 (17) : 144 - 153
  • [29] A new text classification model based on the sentence space
    Zhu, TD
    Zhao, XX
    Liu, YS
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 1774 - 1777
  • [30] VECTOR-SPACE MODEL FOR AUTOMATIC INDEXING
    SALTON, G
    WONG, A
    YANG, CS
    COMMUNICATIONS OF THE ACM, 1975, 18 (11) : 613 - 620