A Chinese Text Classification Model Based on Radicals and Character Distinctions

被引:1
|
作者
Yan-Xin, Huang [1 ]
Bo, Li [1 ]
机构
[1] Chongqing Univ Technol, Coll Comp Sci & Engn, Chongqing 400054, Peoples R China
关键词
Semantics; Bit error rate; Text categorization; Feature extraction; Deep learning; Transformers; Data mining; China; Radicals; traditional Chinese; Chinese text classification;
D O I
10.1109/ACCESS.2023.3257339
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Chinese characters are generally correlated with their semantic meanings, and the structure of radicals, in particular, can be a clear indication of how characters are related to each other. In the Chinese characters simplification movement, some different traditional characters have been transferred into one simplified character (many-to-one mapping), resulting in the phenomenon of 'one simplified character corresponding to many traditional characters. Compared to the simplified characters, the traditional characters contain richer structural information, which is also more meaningful to semantic understanding. Traditional approaches of text modelling often overlook the structural content of Chinese characters and the role of human cognitive behaviour in the process of text comprehension. Hence, we propose a Chinese text classification model derived from the construction methods and evolution of Chinese characters. The model consists of two branches: the simplified and the traditional, with an attention module based on the radical classification in each branch. Specifically, we first develop a sequential modelling structure to obtain sequence information of Chinese texts. Afterwards, an associated word module using the part head as a medium is designed to filter out keywords with high semantic differentiation among the auxiliary units. An attention module is then implemented to balance the importance of each keyword in a particular context. Our proposed method is conducted on three datasets to demonstrate validity and plausibility.
引用
收藏
页码:45520 / 45526
页数:7
相关论文
共 50 条
  • [41] A study on deception detection based on classification for Chinese text
    Zhang, Hu
    Wei, Shande
    Tan, Hongye
    Zheng, Jiaheng
    Journal of Information and Computational Science, 2009, 6 (03): : 1253 - 1261
  • [42] Chinese Text Classification Based on Ant Colony Optimization
    Luo Xin
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 51 - 54
  • [43] The Research of Chinese Text Automatic Classification Based on Multiple
    Zhang, Shengli
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 1543 - 1548
  • [44] Chinese Text Classification Based on Particle Swarm Optimization
    Luo Xin
    PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015), 2016, 47 : 53 - 58
  • [45] Research on Classification of Chinese Text Data Based on SVM
    Lin, Yuan
    Yu, Hongzhi
    Wan, Fucheng
    Xu, Tao
    2017 2ND INTERNATIONAL SEMINAR ON ADVANCES IN MATERIALS SCIENCE AND ENGINEERING, 2017, 231
  • [46] Positional specificity of radicals in Chinese character recognition
    Taft, M
    Zhu, XP
    Peng, DL
    JOURNAL OF MEMORY AND LANGUAGE, 1999, 40 (04) : 498 - 519
  • [47] Text Coverless Information Hiding Based on the Combination of Chinese Character Components
    Wang, Junyu
    Zhu, Yani
    Ni, Jiaming
    Wang, Hui
    Yao, Ye
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (03)
  • [48] Reading Chinese Scene Text with Arbitrary Arrangement based on Character Spotting
    Song, Qi
    Zhang, Rui
    Zhou, Yongsheng
    Jiang, Qianyi
    Liu, Xi
    Wang, Haozong
    Wang, Dong
    2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, 2019, : 91 - 96
  • [49] Image and Text Fusion for Character-based Breast Cancer Classification
    Qiao, Pan
    Jin, Yanhong
    Chen, Dehua
    Zhang, YuanYuan
    IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 298 - 305
  • [50] Online structure based chinese character pre-classification
    Cao, H
    Kot, AC
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 395 - 398