A Chinese Text Classification Model Based on Radicals and Character Distinctions

被引:1
|
作者
Yan-Xin, Huang [1 ]
Bo, Li [1 ]
机构
[1] Chongqing Univ Technol, Coll Comp Sci & Engn, Chongqing 400054, Peoples R China
关键词
Semantics; Bit error rate; Text categorization; Feature extraction; Deep learning; Transformers; Data mining; China; Radicals; traditional Chinese; Chinese text classification;
D O I
10.1109/ACCESS.2023.3257339
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Chinese characters are generally correlated with their semantic meanings, and the structure of radicals, in particular, can be a clear indication of how characters are related to each other. In the Chinese characters simplification movement, some different traditional characters have been transferred into one simplified character (many-to-one mapping), resulting in the phenomenon of 'one simplified character corresponding to many traditional characters. Compared to the simplified characters, the traditional characters contain richer structural information, which is also more meaningful to semantic understanding. Traditional approaches of text modelling often overlook the structural content of Chinese characters and the role of human cognitive behaviour in the process of text comprehension. Hence, we propose a Chinese text classification model derived from the construction methods and evolution of Chinese characters. The model consists of two branches: the simplified and the traditional, with an attention module based on the radical classification in each branch. Specifically, we first develop a sequential modelling structure to obtain sequence information of Chinese texts. Afterwards, an associated word module using the part head as a medium is designed to filter out keywords with high semantic differentiation among the auxiliary units. An attention module is then implemented to balance the importance of each keyword in a particular context. Our proposed method is conducted on three datasets to demonstrate validity and plausibility.
引用
收藏
页码:45520 / 45526
页数:7
相关论文
共 50 条
  • [31] An Automatic Matching Model for Chinese Test Questions and Knowledge Points Based on Text Classification
    Li, Yancong
    Shao, Zengzhen
    Sun, Hongxu
    Zhao, Xuechen
    Guo, Yanhui
    2018 11TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2018, : 352 - 355
  • [32] A Chinese text classification algorithm based on granular computing
    Qiu, Taorong
    Huang, Houkuan
    Liu, Qing
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 4042 - +
  • [33] Chinese web page classification based on text contents
    Liang, JZ
    ISTM/2003: 5TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-6, CONFERENCE PROCEEDINGS, 2003, : 4733 - 4736
  • [34] Chinese Text Sentiment Classification based on Granule Network
    Zhang Xia
    Wang Suzhen
    Xu Mingzhu
    Yin Yixin
    2009 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING ( GRC 2009), 2009, : 775 - +
  • [35] Short Chinese Text Classification Based on Correlation Analysis
    Zheng, Chenyang
    Usagawa, Tsuyoshi
    PROCEEDINGS OF 2017 11TH INTERNATIONAL CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGY AND SYSTEMS (ICTS), 2017, : 265 - 268
  • [36] Integrated features based sentiment classification for Chinese text
    Gan, Xiaohong
    Journal of Convergence Information Technology, 2012, 7 (19) : 450 - 458
  • [37] A vector-based algorithm for Chinese text classification
    Luo, CR
    He, TT
    PACLIC 17: Language, Information and Computation, Proceedings, 2003, : 235 - 242
  • [38] The Instructional Design of Chinese Text Classification based on SVM
    Wei, Sichao
    Guo, Jianyi
    Yu, Zhengtao
    Chen, Peng
    Xian, Yantuan
    2013 25TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2013, : 5114 - 5117
  • [39] Chinese Text Classification Based on Ant Colony Optimization
    Luo Xin
    PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015), 2016, 47 : 37 - 41
  • [40] Imbalanced Chinese Text Classification Based on Weighted Sampling
    Li, Hu
    Zou, Peng
    Han, WeiHong
    Xia, Rongze
    TRUSTWORTHY COMPUTING AND SERVICES, 2014, 426 : 38 - 45