Annotated news corpora and a lexicon for sentiment analysis in Slovene

被引:14
|
作者
Bucar, Joze [1 ,2 ]
Znidarsic, Martin [4 ]
Povh, Janez [2 ,3 ]
机构
[1] Real Estate Mass Valuat Syst Surveying & Mapping, Ljubljana, Slovenia
[2] Fac Informat Studies, Lab Data Technol, Novo Mesto, Slovenia
[3] Fac Mech Engn, Lab Engn Design, Ljubljana, Slovenia
[4] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana, Slovenia
关键词
News corpus; Sentiment analysis; Lexicon; Annotated corpus; Corpus linguistics; Web-crawling; Word list; AFINN; Slovene; Machine learning; Document classification; Monitoring sentiment dynamics; TEXTS;
D O I
10.1007/s10579-018-9413-3
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this study, we introduce Slovene web-crawled news corpora with sentiment annotation on three levels of granularity: sentence, paragraph and document levels. We describe the methodology and tools that were required for their construction. The corpora contain more than 250,000 documents with political, business, economic and financial content from five Slovene media resources on the web. More than 10,000 of them were manually annotated as negative, neutral or positive. All corpora are publicly available under a Creative Commons copyright license. We used the annotated documents to construct a Slovene sentiment lexicon, which is the first of its kind for Slovene, and to assess the sentiment classification approaches used. The constructed corpora were also utilised to monitor within-the-document sentiment dynamics, its changes over time and relations with news topics. We show that sentiment is, on average, more explicit at the beginning of documents, and it loses sharpness towards the end of documents.
引用
收藏
页码:895 / 919
页数:25
相关论文
共 50 条
  • [21] A lexicon weighted sentiment analysis approach on Twitter
    Shayegan M.J.
    Molanorouzi M.
    Int. J. Web Based Communities, 2021, 3 (149-162): : 149 - 162
  • [22] Lexicon-Based Methods for Sentiment Analysis
    Taboada, Maite
    Brooke, Julian
    Tofiloski, Milan
    Voll, Kimberly
    Stede, Manfred
    COMPUTATIONAL LINGUISTICS, 2011, 37 (02) : 267 - 307
  • [23] SentiTurkNet: a Turkish polarity lexicon for sentiment analysis
    Rahim Dehkharghani
    Yucel Saygin
    Berrin Yanikoglu
    Kemal Oflazer
    Language Resources and Evaluation, 2016, 50 : 667 - 685
  • [24] SentiTurkNet: a Turkish polarity lexicon for sentiment analysis
    Dehkharghani, Rahim
    Saygin, Yucel
    Yanikoglu, Berrin
    Oflazer, Kemal
    LANGUAGE RESOURCES AND EVALUATION, 2016, 50 (03) : 667 - 685
  • [25] SentiFars: A Persian Polarity Lexicon for Sentiment Analysis
    Dehkharghani, Rahim
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (02)
  • [26] Sentiment analysis Approach to adapt a shallow parsing based sentiment lexicon
    Desai, Jayraj M.
    Andhariya, Swapnil R.
    2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [27] Generate domain-specific sentiment lexicon for review sentiment analysis
    Hongyu Han
    Jianpei Zhang
    Jing Yang
    Yiran Shen
    Yongshi Zhang
    Multimedia Tools and Applications, 2018, 77 : 21265 - 21280
  • [28] Automatic Lexicon Construction for Arabic Sentiment Analysis
    Abdulla, Nawaf
    Majdalawi, Roa'a
    Mohammed, Salwa
    Al-Ayyoub, Mahmoud
    Al-Kabi, Mohammed
    2014 INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD), 2014, : 547 - 552
  • [29] Automatically Constructing a Fine-Grained Sentiment Lexicon for Sentiment Analysis
    Wang, Yabing
    Huang, Guimin
    Li, Maolin
    Li, Yiqun
    Zhang, Xiaowei
    Li, Hui
    COGNITIVE COMPUTATION, 2023, 15 (01) : 254 - 271
  • [30] Generate domain-specific sentiment lexicon for review sentiment analysis
    Han, Hongyu
    Zhang, Jianpei
    Yang, Jing
    Shen, Yiran
    Zhang, Yongshi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (16) : 21265 - 21280