Exploiting effective features for chinese sentiment classification

被引:65
|
作者
Zhai, Zhongwu [1 ]
Xu, Hua [1 ]
Kang, Bada [2 ]
Jia, Peifa [1 ]
机构
[1] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Univ So Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA
基金
中国国家自然科学基金;
关键词
Sentiment classification; Substring features; Substring-group; Suffix tree;
D O I
10.1016/j.eswa.2011.01.047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Features play a fundamental role in sentiment classification. How to effectively select different types of features to improve sentiment classification performance is the primary topic of this paper. Ngram features are commonly employed in text classification tasks; in this paper, sentiment-words, substrings, substring-groups, and key-substring-groups, which have never been considered in sentiment classification area before, are also extracted as features. The extracted features are then compared and analyzed. To demonstrate generality, we use two authoritative Chinese data sets in different domains to conduct our experiments. Our statistical analysis of the experimental results indicate the following: (1) different types of features possess different discriminative capabilities in Chinese sentiment classification; (2) character bigram features perform the best among the Ngram features; (3) substring-group features have greater potential to improve the performance of sentiment classification by combining substrings of different lengths; (4) sentiment words or phrases extracted from existing sentiment lexicons are not effective for sentiment classification; (5) effective features are usually at varying lengths rather than fixed lengths. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:9139 / 9146
页数:8
相关论文
共 50 条
  • [21] Sentiment Polarity Classification using Structural Features
    Ansari, Daniel
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 1270 - 1273
  • [22] Leveraging multiple features for document sentiment classification
    Kong, Li
    Li, Chuanyi
    Ge, Jidong
    Zhang, FeiFei
    Feng, Yi
    Li, Zhongjin
    Luo, Bin
    INFORMATION SCIENCES, 2020, 518 : 39 - 55
  • [23] Sentiment Classification Analysis of Chinese Microblog Network
    Wang, Xiaotian
    Zhang, Chuang
    Wu, Ming
    COMPLEX NETWORKS VI, 2015, 597 : 123 - 129
  • [24] Chinese Microblog Sentiment Classification Based on Deep Belief Nets with Extended Multi-modality Features
    Sun, Xiao
    Li, Chengcheng
    Xu, Wanyi
    Ren, Fuji
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, : 928 - 935
  • [25] Exploiting Dependency Relations for Sentence Level Sentiment Classification using SVM
    Paramesha, K.
    Ravishankar, K. C.
    2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
  • [26] Chinese Reviews Sentiment Classification Based on Quantified Sentiment Lexicon and Fuzzy Set
    Wang, Bingkun
    Min, Yulin
    Huang, Yongfeng
    Liu, Yusi
    Li, Xing
    Sun, Yubao
    Sun, Chaowei
    2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 677 - 680
  • [27] A Hybrid Method for Sentiment Classification in Chinese Movie Reviews Based on Sentiment Labels
    Zhao, Kai
    Jin, Yaohong
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 86 - 89
  • [28] Effective Use of Linguistic Features for Sentiment Analysis of Korean
    Jang, Hayeon
    Shin, Hyopil
    PROCEEDINGS OF THE 24TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2010, : 173 - 182
  • [29] Sentiment Classification of Tweets with Non-Language Features
    Akilandeswari, J.
    Jothi, G.
    8TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATIONS (ICACC-2018), 2018, 143 : 426 - 433
  • [30] Sentiment Analysis of Chinese Microblogs Based on Layered Features
    Wang, Dongfang
    Li, Fang
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 361 - 368