Automatic Chinese Text Classification Using Character-based and Word-based Approach

被引:4
|
作者
Luo, Xi [1 ]
Ohyama, Wataru [1 ]
Wakabayashi, Tetsushi [1 ]
Kimura, Fumitaka [1 ]
机构
[1] Mie Univ, Grad Sch Engn, Tsu, Mie 514, Japan
来源
2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR) | 2013年
关键词
Chinese Text Classification/Categorization; N-gram; Feature Transformation; Dimension Reduction; Principal Component Analysis; Support Vector Machine;
D O I
10.1109/ICDAR.2013.73
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study on Chinese text classification using character-based approach (N-gram) and word-based approach and propose the use of uni-gram, bi-gram and word features of length greater than or equal to three. A weight coefficient which can be used to give higher weights to word features is also introduced. We further investigate a serial approach based on feature transformation and dimension reduction techniques to improve the performance. Experimental results show that our proposed approach is efficient and effective for improving the performance of Chinese text classification.
引用
收藏
页码:329 / 333
页数:5
相关论文
共 50 条
  • [41] Word-based morphology
    Blevins, James P.
    JOURNAL OF LINGUISTICS, 2006, 42 (03) : 531 - 573
  • [42] A word-based predictive text entry method for Khmer language
    Ouk, Phavy
    Thu, Ye Kyaw
    Matsumoto, Mitsuji
    Urano, Yoshiyori
    PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 214 - 219
  • [43] Word-based compression methods and indexing for text retrieval systems
    Dvorsky, J
    Pokorny, J
    Snásel, V
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 1999, 1691 : 75 - 84
  • [44] Automatic Chinese Text Classification Based on NSVMDT-KNN
    Xu, QiNan
    Liu, Zhijng
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 410 - 414
  • [45] Research of Chinese-text automatic classification based on SVM
    Coll. of Management, Univ. of Shanghai Science and Technology, Shanghai 200093, China
    Xi Tong Cheng Yu Dian Zi Ji Shu/Syst Eng Electron, 2007, 3 (475-478):
  • [46] Character-Based Indexing Using Inverted Lists
    Knancome, Chouvalit
    Boonjing, Veera
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT, VOL 1, 2009, : 221 - 224
  • [47] Chinese text classification based on character-level CNN and SVM
    Wu H.
    Li D.
    Cheng M.
    International Journal of Intelligent Information and Database Systems, 2019, 12 (03) : 212 - 228
  • [48] Low-resource neural character-based noisy text normalization
    Mager, Manuel
    Jasso Rosales, Monica
    Cetinoglu, Ozlem
    Meza, Ivan
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4921 - 4929
  • [49] Improved Character-Based Chinese Dependency Parsing by Using Stack-Tree LSTM
    Liu, Hang
    Liu, Mingtong
    Zhang, Yujie
    Xu, Jinan
    Chen, Yufeng
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 203 - 212
  • [50] The Rare Word Issue in Natural Language Generation: A Character-Based Solution
    Bonetta, Giovanni
    Roberti, Marco
    Cancelliere, Rossella
    Gallinari, Patrick
    INFORMATICS-BASEL, 2021, 8 (01):