IRLCov19: A Large COVID-19 Multilingual Twitter Dataset of Indian Regional Languages

被引:0
|
作者
Uniyal, Deepak [1 ]
Agarwal, Amit [2 ]
机构
[1] Graph Era Univ, Dehra Dun, Uttarakhand, India
[2] IIT Roorkee, Roorkee, Uttar Pradesh, India
来源
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II | 2021年 / 1525卷
关键词
COVID-19; Twitter; Indian Regional Languages; Natural Language Processing;
D O I
10.1007/978-3-030-93733-1_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emerged in Wuhan city of China in December 2019, COVID-19 continues to spread rapidly across the world despite authorities having made available a number of vaccines. While the coronavirus has been around for a significant period of time, people and authorities still feel the need for awareness due to the mutating nature of the virus and therefore varying symptoms and prevention strategies. People and authorities resort to social media platforms the most to share awareness information and voice out their opinions due to their massive outreach in spreading the word in practically no time. People use a number of languages to communicate over social media platforms based on their familiarity, language outreach, and availability on social media platforms. The entire world has been hit by the coronavirus and India is the second worst-hit country in terms of the number of active coronavirus cases. India, being a multilingual country, offers a great opportunity to study the outreach of various languages that have been actively used across social media platforms. In this study, we aim to study the dataset related to COVID-19 collected in the period between February 2020 to July 2020 specifically for regional languages in India. This could be helpful for the Government of India, various state governments, NGOs, researchers, and policymakers in studying different issues related to the pandemic. We found that English has been the mode of communication in over 64% of tweets while as many as twelve regional languages in India account for approximately 4.77% of tweets.
引用
收藏
页码:309 / 324
页数:16
相关论文
共 50 条
  • [21] A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research-An International Collaboration
    Banda, Juan M.
    Tekumalla, Ramya
    Wang, Guanyu
    Yu, Jingyuan
    Liu, Tuo
    Ding, Yuning
    Artemova, Ekaterina
    Tutubalina, Elena
    Chowell, Gerardo
    EPIDEMIOLOGIA, 2021, 2 (03): : 315 - 324
  • [22] Effects of COVID-19 on Multilingual Communication
    Pilgun, Maria
    Raskhodchikov, Aleksei N.
    Koreneva Antonova, Olga
    FRONTIERS IN PSYCHOLOGY, 2022, 12
  • [23] A large multiclass dataset of CT scans for COVID-19 identification
    Soares, Eduardo
    Angelov, Plamen
    Biaso, Sarah
    Cury, Marcelo
    Abe, Daniel
    EVOLVING SYSTEMS, 2024, 15 (02) : 635 - 640
  • [24] A large multiclass dataset of CT scans for COVID-19 identification
    Eduardo Soares
    Plamen Angelov
    Sarah Biaso
    Marcelo Cury
    Daniel Abe
    Evolving Systems, 2024, 15 : 635 - 640
  • [25] Opinions on Homeopathy for COVID-19 on Twitter
    Bopaiah, Jeevith
    Garimella, Kiran
    Kavuluru, Ramakanth
    PROCEEDINGS OF THE 14TH ACM WEB SCIENCE CONFERENCE, WEBSCI 2022, 2022, : 359 - 363
  • [26] Politicization of the Discussion of COVID-19 on "Twitter"
    Ovchinnikova, Irina G.
    Ermakova, Liana M.
    Nurbakova, Diana M.
    FILOLOGICHESKIE NAUKI-NAUCHNYE DOKLADY VYSSHEI SHKOLY-PHILOLOGICAL SCIENCES-SCIENTIFIC ESSAYS OF HIGHER EDUCATION, 2021, (06): : 3 - 11
  • [27] A national discussion of COVID-19 on Twitter
    Diaz, Marlon, I
    Lehmann, Christoph U.
    Lam, Philip W.
    Medford, Richard J.
    JOURNAL OF THE ASSOCIATION OF MEDICAL MICROBIOLOGY AND INFECTIOUS DISEASE CANADA (JAMMI), 2024, 9 (04): : 294 - 307
  • [28] COVID-19 and Misinformation: A Large-Scale Lexical Analysis on Twitter
    Antypas, Dimosthenis
    Rogers, David
    Preece, Alun
    Camacho-Collados, Jose
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 119 - 126
  • [29] Indian Healthcare Infrastructure Analysis during COVID-19 using Twitter Sentiments
    Fatima, Noor
    Belal, Mohd
    Kumar, Kaushal
    Sadaf, Rumi
    2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 270 - 274
  • [30] ANTi-Vax: a novel Twitter dataset for COVID-19 vaccine misinformation detection
    Hayawi, K.
    Shahriar, S.
    Serhani, M. A.
    Taleb, I
    Mathew, S. S.
    PUBLIC HEALTH, 2022, 203 : 23 - 30