IRLCov19: A Large COVID-19 Multilingual Twitter Dataset of Indian Regional Languages

被引:0
|
作者
Uniyal, Deepak [1 ]
Agarwal, Amit [2 ]
机构
[1] Graph Era Univ, Dehra Dun, Uttarakhand, India
[2] IIT Roorkee, Roorkee, Uttar Pradesh, India
关键词
COVID-19; Twitter; Indian Regional Languages; Natural Language Processing;
D O I
10.1007/978-3-030-93733-1_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emerged in Wuhan city of China in December 2019, COVID-19 continues to spread rapidly across the world despite authorities having made available a number of vaccines. While the coronavirus has been around for a significant period of time, people and authorities still feel the need for awareness due to the mutating nature of the virus and therefore varying symptoms and prevention strategies. People and authorities resort to social media platforms the most to share awareness information and voice out their opinions due to their massive outreach in spreading the word in practically no time. People use a number of languages to communicate over social media platforms based on their familiarity, language outreach, and availability on social media platforms. The entire world has been hit by the coronavirus and India is the second worst-hit country in terms of the number of active coronavirus cases. India, being a multilingual country, offers a great opportunity to study the outreach of various languages that have been actively used across social media platforms. In this study, we aim to study the dataset related to COVID-19 collected in the period between February 2020 to July 2020 specifically for regional languages in India. This could be helpful for the Government of India, various state governments, NGOs, researchers, and policymakers in studying different issues related to the pandemic. We found that English has been the mode of communication in over 64% of tweets while as many as twelve regional languages in India account for approximately 4.77% of tweets.
引用
收藏
页码:309 / 324
页数:16
相关论文
共 50 条
  • [1] A multilingual dataset of COVID-19 vaccination attitudes on Twitter
    Chen, Ninghan
    Chen, Xihui
    Pang, Jun
    DATA IN BRIEF, 2022, 44
  • [2] An augmented multilingual Twitter dataset for studying the COVID-19 infodemic
    Lopez, Christian E.
    Gallemore, Caleb
    SOCIAL NETWORK ANALYSIS AND MINING, 2021, 11 (01)
  • [3] An augmented multilingual Twitter dataset for studying the COVID-19 infodemic
    Christian E. Lopez
    Caleb Gallemore
    Social Network Analysis and Mining, 2021, 11
  • [4] Multilingual Indian COVID-19 Chatbot
    Thara, S.
    Jyothiratnam
    Sonpole, Satya Harthik
    Inturi, Bhargav
    Krishna, Ajay
    Vuppala, Sahit
    Nedungadi, Prema
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 2, SMARTCOM 2024, 2024, 946 : 47 - 64
  • [5] CMTA: COVID-19 Misinformation Multilingual Analysis on Twitter
    Pranesh, Raj Ratn
    Farokhnejad, Mehrdad
    Shekhar, Ambesh
    Vargas-Solar, Genoveva
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 270 - 283
  • [6] The Languages of COVID-19: Translational and Multilingual Perspectives on Global Healthcare
    Forsdick, Charles
    Blumczynski, Piotr
    Wilson, Steven
    TRANSLATOR, 2023, 29 (03): : 395 - 399
  • [7] The Languages of COVID-19: Translational and Multilingual Perspectives on Global Healthcare
    Bodzer, Anca
    BABEL-REVUE INTERNATIONALE DE LA TRADUCTION-INTERNATIONAL JOURNAL OF TRANSLATION, 2024,
  • [8] The Languages of COVID-19: Translational and Multilingual Perspectives on Global Healthcare
    Declercq, Christophe
    Cox, Antoon
    BABEL-REVUE INTERNATIONALE DE LA TRADUCTION-INTERNATIONAL JOURNAL OF TRANSLATION, 2025, 71 (01): : 137 - 140
  • [9] COCO: an annotated Twitter dataset of COVID-19 conspiracy theories
    Langguth, Johannes
    Schroeder, Daniel Thilo
    Filkukova, Petra
    Brenner, Stefan
    Phillips, Jesper
    Pogorelov, Konstantin
    JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2023, 6 (02): : 443 - 484
  • [10] COCO: an annotated Twitter dataset of COVID-19 conspiracy theories
    Johannes Langguth
    Daniel Thilo Schroeder
    Petra Filkuková
    Stefan Brenner
    Jesper Phillips
    Konstantin Pogorelov
    Journal of Computational Social Science, 2023, 6 : 443 - 484