A Chinese Text Classification Model Based on Radicals and Character Distinctions

被引:1
|
作者
Yan-Xin, Huang [1 ]
Bo, Li [1 ]
机构
[1] Chongqing Univ Technol, Coll Comp Sci & Engn, Chongqing 400054, Peoples R China
关键词
Semantics; Bit error rate; Text categorization; Feature extraction; Deep learning; Transformers; Data mining; China; Radicals; traditional Chinese; Chinese text classification;
D O I
10.1109/ACCESS.2023.3257339
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Chinese characters are generally correlated with their semantic meanings, and the structure of radicals, in particular, can be a clear indication of how characters are related to each other. In the Chinese characters simplification movement, some different traditional characters have been transferred into one simplified character (many-to-one mapping), resulting in the phenomenon of 'one simplified character corresponding to many traditional characters. Compared to the simplified characters, the traditional characters contain richer structural information, which is also more meaningful to semantic understanding. Traditional approaches of text modelling often overlook the structural content of Chinese characters and the role of human cognitive behaviour in the process of text comprehension. Hence, we propose a Chinese text classification model derived from the construction methods and evolution of Chinese characters. The model consists of two branches: the simplified and the traditional, with an attention module based on the radical classification in each branch. Specifically, we first develop a sequential modelling structure to obtain sequence information of Chinese texts. Afterwards, an associated word module using the part head as a medium is designed to filter out keywords with high semantic differentiation among the auxiliary units. An attention module is then implemented to balance the importance of each keyword in a particular context. Our proposed method is conducted on three datasets to demonstrate validity and plausibility.
引用
收藏
页码:45520 / 45526
页数:7
相关论文
共 50 条
  • [1] Word-character attention model for Chinese text classification
    Qiao, Xue
    Peng, Chen
    Liu, Zhen
    Hu, Yanfeng
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (12) : 3521 - 3537
  • [2] Word-character attention model for Chinese text classification
    Xue Qiao
    Chen Peng
    Zhen Liu
    Yanfeng Hu
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 3521 - 3537
  • [3] Research on the Technique of Chinese Text Classification Based on the Single Chinese Character Feature
    Zhang, Yubin
    Lu, Jianfeng
    Yang, Jingyu
    PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 774 - 778
  • [4] Chinese text classification based on character-level CNN and SVM
    Wu H.
    Li D.
    Cheng M.
    International Journal of Intelligent Information and Database Systems, 2019, 12 (03) : 212 - 228
  • [5] Chinese Text Classification Model Based on Deep Learning
    Li, Yue
    Wang, Xutao
    Xu, Pengjian
    FUTURE INTERNET, 2018, 10 (11):
  • [6] Automatic Chinese Text Classification Using Character-based and Word-based Approach
    Luo, Xi
    Ohyama, Wataru
    Wakabayashi, Tetsushi
    Kimura, Fumitaka
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 329 - 333
  • [7] Dynamically Jointing character and word embedding for Chinese text Classification
    Tang, Xuetao
    Hu, Xuegang
    Li, Peipei
    11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 336 - 343
  • [8] Research on Chinese News Text Classification Based on ERNIE Model
    Zhang, Wenxu
    PROCEEDINGS OF THE WORLD CONFERENCE ON INTELLIGENT AND 3-D TECHNOLOGIES, WCI3DT 2022, 2023, 323 : 89 - 100
  • [9] Chinese Web Text Classification Model Based on Manifold Learning
    Shi, Shengli
    Fu, Zhibin
    Li, Jinzhao
    INFORMATION COMPUTING AND APPLICATIONS, PT 1, 2012, 307 : 722 - +
  • [10] A Complaint Text Classification Model Based on Character-level Convolutional Network
    Tong, Xuesong
    Wu, Bin
    Wang, Shuyang
    Lv, Jinna
    PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 507 - 511