Exploiting effective features for chinese sentiment classification

被引:65
|
作者
Zhai, Zhongwu [1 ]
Xu, Hua [1 ]
Kang, Bada [2 ]
Jia, Peifa [1 ]
机构
[1] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Univ So Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA
基金
中国国家自然科学基金;
关键词
Sentiment classification; Substring features; Substring-group; Suffix tree;
D O I
10.1016/j.eswa.2011.01.047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Features play a fundamental role in sentiment classification. How to effectively select different types of features to improve sentiment classification performance is the primary topic of this paper. Ngram features are commonly employed in text classification tasks; in this paper, sentiment-words, substrings, substring-groups, and key-substring-groups, which have never been considered in sentiment classification area before, are also extracted as features. The extracted features are then compared and analyzed. To demonstrate generality, we use two authoritative Chinese data sets in different domains to conduct our experiments. Our statistical analysis of the experimental results indicate the following: (1) different types of features possess different discriminative capabilities in Chinese sentiment classification; (2) character bigram features perform the best among the Ngram features; (3) substring-group features have greater potential to improve the performance of sentiment classification by combining substrings of different lengths; (4) sentiment words or phrases extracted from existing sentiment lexicons are not effective for sentiment classification; (5) effective features are usually at varying lengths rather than fixed lengths. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:9139 / 9146
页数:8
相关论文
共 50 条
  • [41] An Empirical Study of Unsupervised Sentiment Classification of Chinese Reviews
    翟忠武
    徐华
    贾培发
    Tsinghua Science and Technology, 2010, 15 (06) : 702 - 708
  • [42] Combining Emojis with Arabic Textual Features for Sentiment Classification
    Al-Azani, Sadam
    El-Alfy, El-Sayed M.
    2018 9TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2018, : 139 - 144
  • [43] Generating Fluent Chinese Adversarial Examples for Sentiment Classification
    Wang, Congyi
    Zeng, Jianping
    Wu, Chengrong
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (ASID), 2020, : 149 - +
  • [44] Sentiment Classification of Chinese Microblogging Texts with Global RNN
    Cheng, Jiajun
    Li, Pei
    Ding, Zhaoyun
    Zhang, Sheng
    Wang, Hui
    2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC 2016), 2016, : 653 - 657
  • [45] Detecting Dependency-Related Sentiment Features for Aspect-Level Sentiment Classification
    Zhang, Xing
    Xu, Jingyun
    Cai, Yi
    Tan, Xingwei
    Zhu, Changxi
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (01) : 196 - 210
  • [46] Exploiting and integrating rich features for biological literature classification
    Hongning Wang
    Minlie Huang
    Shilin Ding
    Xiaoyan Zhu
    BMC Bioinformatics, 9
  • [47] Exploiting and integrating rich features for biological literature classification
    Wang, Hongning
    Huang, Minlie
    Ding, Shilin
    Zhu, Xiaoyan
    BMC BIOINFORMATICS, 2008, 9 (Suppl 3)
  • [48] Sentiment classification of Chinese Weibo based on extended sentiment dictionary and organisational structure of comments
    Wei, Zhongliang
    Liu, Wenjuan
    Zhu, Guangli
    Zhang, Shunxiang
    Hsieh, Meng-Yen
    CONNECTION SCIENCE, 2022, 34 (01) : 409 - 428
  • [49] Sentiment Groups as Features of a Classification Model Using a Spanish Sentiment Lexicon: A Hybrid Approach
    Gutierrez, Ernesto
    Cervantes, Ofelia
    Baez-Lopez, David
    Alfredo Sanchez, J.
    PATTERN RECOGNITION (MCPR 2015), 2015, 9116 : 258 - 268
  • [50] Exploiting Out-of-Domain Datasets and Visual Representations for Image Sentiment Classification
    Pournaras, Alexandros
    Gkalelis, Nikolaos
    Galanopoulos, Damianos
    Mezaris, Vasileios
    2021 16TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION & PERSONALIZATION (SMAP 2021), 2021, : 42 - 47