Annotated news corpora and a lexicon for sentiment analysis in Slovene

被引:14
|
作者
Bucar, Joze [1 ,2 ]
Znidarsic, Martin [4 ]
Povh, Janez [2 ,3 ]
机构
[1] Real Estate Mass Valuat Syst Surveying & Mapping, Ljubljana, Slovenia
[2] Fac Informat Studies, Lab Data Technol, Novo Mesto, Slovenia
[3] Fac Mech Engn, Lab Engn Design, Ljubljana, Slovenia
[4] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana, Slovenia
关键词
News corpus; Sentiment analysis; Lexicon; Annotated corpus; Corpus linguistics; Web-crawling; Word list; AFINN; Slovene; Machine learning; Document classification; Monitoring sentiment dynamics; TEXTS;
D O I
10.1007/s10579-018-9413-3
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this study, we introduce Slovene web-crawled news corpora with sentiment annotation on three levels of granularity: sentence, paragraph and document levels. We describe the methodology and tools that were required for their construction. The corpora contain more than 250,000 documents with political, business, economic and financial content from five Slovene media resources on the web. More than 10,000 of them were manually annotated as negative, neutral or positive. All corpora are publicly available under a Creative Commons copyright license. We used the annotated documents to construct a Slovene sentiment lexicon, which is the first of its kind for Slovene, and to assess the sentiment classification approaches used. The constructed corpora were also utilised to monitor within-the-document sentiment dynamics, its changes over time and relations with news topics. We show that sentiment is, on average, more explicit at the beginning of documents, and it loses sharpness towards the end of documents.
引用
收藏
页码:895 / 919
页数:25
相关论文
共 50 条
  • [31] Automatically Constructing a Fine-Grained Sentiment Lexicon for Sentiment Analysis
    Yabing Wang
    Guimin Huang
    Maolin Li
    Yiqun Li
    Xiaowei Zhang
    Hui Li
    Cognitive Computation, 2023, 15 : 254 - 271
  • [32] Exploring Twitter News Biases Using Urdu-based Sentiment Lexicon
    Amjad, Kamran
    Ishtiaq, Maria
    Firdous, Samar
    Mehmood, Muhammad Amir
    2017 INTERNATIONAL CONFERENCE ON OPEN SOURCE SYSTEMS & TECHNOLOGIES (ICOSST), 2017, : 48 - 53
  • [33] Annotated Amharic Corpora
    Rychly, Pavel
    Suchomel, Vit
    TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 295 - 302
  • [34] Lexicon-based Comments-oriented News Sentiment Analyzer system
    Moreo, A.
    Romero, M.
    Castro, J. L.
    Zurita, J. M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (10) : 9166 - 9180
  • [35] A Thesaurus-Based Sentiment Lexicon for Danish: The Danish Sentiment Lexicon
    Nimb, Sanni
    Olsen, Sussi
    Pedersen, Bolette S.
    Troelsgard, Thomas
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2826 - 2832
  • [36] Annotated Corpus for Sentiment Analysis in Odia Language
    Mohanty, Gaurav
    Mishra, Pruthwik
    Mamidi, Radhika
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2788 - 2795
  • [37] Towards the Lexicon-Based Sentiment Analysis of Polish Texts: Polarity Lexicon
    Haniewicz, Konstanty
    Rutkowski, Wojciech
    Adamczyk, Magdalena
    Kaczmarek, Monika
    COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, 2013, 8083 : 286 - 295
  • [38] Sentiment Spreading: An Epidemic Model for Lexicon-Based Sentiment Analysis on Twitter
    Pollacci, Laura
    Sirbu, Alina
    Giannotti, Fosca
    Pedreschi, Dino
    Lucchese, Claudio
    Muntean, Cristina Ioana
    AI*IA 2017 ADVANCES IN ARTIFICIAL INTELLIGENCE, 2017, 10640 : 114 - 127
  • [39] Automatic Indonesian Sentiment Lexicon Curation with Sentiment Valence Tuning for Social Media Sentiment Analysis
    Wijayanti, Rini
    Arisal, Andria
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (01)
  • [40] Corpora For Sentiment Analysis Of Arabic Text In Social Media
    Itani, Maher
    Roast, Chris
    Al-Khayatt, Samir
    2017 8TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2017, : 64 - 69