Analysis on Chinese quantitative stylistic features based on text mining

被引:2
|
作者
Hou, Renkui [1 ]
Jiang, Minghu [1 ]
机构
[1] Tsinghua Univ, Sch Humanities, Lab Computat Linguist, Beijing 100084, Peoples R China
关键词
D O I
10.1093/llc/fqu067
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
In this article, data mining was selected to examine whether some linguistic features, taking parts of speech (POS) for instance, can be used as Chinese quantitative stylistic feature. It can be also said that the purpose of this article is to explore the method to determine the Chinese quantitative stylistic features. Texts of different styles, which are news, science, official, art, TV conversation, and daily conversation styles, were selected to establish the corpus for our study. Text vectors characterized by POS were analyzed by principal component analysis and clustered by agglomerative hierarchical clustering method. The results of them indicate that POS can be used as a distinctive feature of texts. Then, support vector machine was adopted to establish classification model on training data and precision and recall rates to validate the results of text classification. Random forest was selected to compute the importance of POS, i.e. the contribution to classification, and text vectors characterized by important POS were clustered and classified consequently. The results of the experiments show that POS can be taken as Chinese quantitative stylistic feature, and the results of clustering and classification are preferably taking the 60 most important POS as the character of texts.
引用
收藏
页码:357 / 367
页数:11
相关论文
共 50 条
  • [1] Stylistic Analysis of Chinese Language Literature Based on Text Mining Techniques
    Shuai, Xiaomin
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01):
  • [2] A Study on Chinese Quantitative Stylistic Features and Relation Among Different Styles Based on Text Clustering
    Hou, Renkui
    Yang, Jiang
    Jiang, Minghu
    JOURNAL OF QUANTITATIVE LINGUISTICS, 2014, 21 (03) : 246 - 280
  • [3] Discrimination of Chinese Quantitative Style Features Based on Text Clustering
    Hou Renkui
    Jiang Minghu
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 2204 - 2207
  • [4] A corpus-based analysis of the stylistic features of Chinese and American diplomatic discourse
    Zhang, Chenxia
    Afzaal, Muhammad
    Omar, Abdulfattah
    Altohami, Waheed M. A.
    FRONTIERS IN PSYCHOLOGY, 2023, 14
  • [5] Quantitative Stylistic Analysis of Middle Chinese Texts Based on the Dissimilarity of Evolutive Core Word Usage
    Qiu, Bing
    Huo, Jiahao
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (07)
  • [6] Differential Analysis of Stylistic Features in Chinese-English Interpretation Based on Natural Language Processing
    Sun W.
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [7] Quantitative Analysis of Food Safety Policy-Based on Text Mining Methods
    Song, Cen
    Guo, Jiaming
    Gholizadeh, Fatemeh
    Zhuang, Jun
    FOODS, 2022, 11 (21)
  • [8] Sentimental text mining based on an additional features method for text classification
    Cheng, Ching-Hsue
    Chen, Hsien-Hsiu
    PLOS ONE, 2019, 14 (06):
  • [9] A Comprehensive Analysis of Text Value and Linguistic Characteristics of Chinese Language Literature Based on Text Mining Technology
    Li, Qi
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [10] Stylistic text classification using functional lexical features
    Argamon, Shlomo
    Whitelaw, Casey
    Chase, Paul
    Hota, Sobhan Raj
    Garg, Navendu
    Levitan, Shlomo
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (06): : 802 - 822