Annotated news corpora and a lexicon for sentiment analysis in Slovene

被引:14
|
作者
Bucar, Joze [1 ,2 ]
Znidarsic, Martin [4 ]
Povh, Janez [2 ,3 ]
机构
[1] Real Estate Mass Valuat Syst Surveying & Mapping, Ljubljana, Slovenia
[2] Fac Informat Studies, Lab Data Technol, Novo Mesto, Slovenia
[3] Fac Mech Engn, Lab Engn Design, Ljubljana, Slovenia
[4] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana, Slovenia
关键词
News corpus; Sentiment analysis; Lexicon; Annotated corpus; Corpus linguistics; Web-crawling; Word list; AFINN; Slovene; Machine learning; Document classification; Monitoring sentiment dynamics; TEXTS;
D O I
10.1007/s10579-018-9413-3
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this study, we introduce Slovene web-crawled news corpora with sentiment annotation on three levels of granularity: sentence, paragraph and document levels. We describe the methodology and tools that were required for their construction. The corpora contain more than 250,000 documents with political, business, economic and financial content from five Slovene media resources on the web. More than 10,000 of them were manually annotated as negative, neutral or positive. All corpora are publicly available under a Creative Commons copyright license. We used the annotated documents to construct a Slovene sentiment lexicon, which is the first of its kind for Slovene, and to assess the sentiment classification approaches used. The constructed corpora were also utilised to monitor within-the-document sentiment dynamics, its changes over time and relations with news topics. We show that sentiment is, on average, more explicit at the beginning of documents, and it loses sharpness towards the end of documents.
引用
收藏
页码:895 / 919
页数:25
相关论文
共 50 条
  • [1] Annotated news corpora and a lexicon for sentiment analysis in Slovene
    Jože Bučar
    Martin Žnidaršič
    Janez Povh
    Language Resources and Evaluation, 2018, 52 : 895 - 919
  • [2] Sentiment Analysis of News Articles: A Lexicon based Approach
    Taj, Soonh
    Shaikh, Baby Bakhtawer
    Meghji, Areej Fatemah
    2019 2ND INTERNATIONAL CONFERENCE ON COMPUTING, MATHEMATICS AND ENGINEERING TECHNOLOGIES (ICOMET), 2019,
  • [3] Developing Turkish Sentiment Lexicon for Sentiment Analysis Using Online News Media
    Saglam, Fatih
    Sever, Hayri
    Genc, Burkay
    2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
  • [4] DepecheMood: a Lexicon for Emotion Analysis from Crowd-Annotated News
    Staiano, Jacopo
    Guerini, Marco
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 427 - 433
  • [5] Sentiment Analysis of News Titles The Role of Entities and a New Affective Lexicon
    Loureiro, Daniel
    Marreiros, Goreti
    Neves, Jose
    PROGRESS IN ARTIFICIAL INTELLIGENCE-BOOK, 2011, 7026 : 1 - 14
  • [6] Sentiment Analysis for Multilingual Corpora
    Galeshchuk, Svitlana
    Qiu, Ju
    Jourdan, Julien
    7TH WORKSHOP ON BALTO-SLAVIC NATURAL LANGUAGE PROCESSING (BSNLP'2019), 2019, : 120 - 125
  • [7] Financial Sentiment Lexicon Analysis
    Sohangir, Sahar
    Petty, Nicholas
    Wang, Dingding
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 286 - 289
  • [8] SentiFul: A Lexicon for Sentiment Analysis
    Neviarouskaya, Alena
    Prendinger, Helmut
    Ishizuka, Mitsuru
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2011, 2 (01) : 22 - 36
  • [9] Sentiment lexicon for sentiment analysis of Saudi dialect tweets
    Al-Thubaity, Abdulmohsen
    Alqahtani, Qubayl
    Aljandal, Abdulaziz
    ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 301 - 307
  • [10] Corpora in Translation: A Slovene Perspective
    Vintar, Spela
    JOURNAL OF SPECIALISED TRANSLATION, 2008, (10): : 40 - 55