IRLCov19: A Large COVID-19 Multilingual Twitter Dataset of Indian Regional Languages

被引:0
|
作者
Uniyal, Deepak [1 ]
Agarwal, Amit [2 ]
机构
[1] Graph Era Univ, Dehra Dun, Uttarakhand, India
[2] IIT Roorkee, Roorkee, Uttar Pradesh, India
来源
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II | 2021年 / 1525卷
关键词
COVID-19; Twitter; Indian Regional Languages; Natural Language Processing;
D O I
10.1007/978-3-030-93733-1_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emerged in Wuhan city of China in December 2019, COVID-19 continues to spread rapidly across the world despite authorities having made available a number of vaccines. While the coronavirus has been around for a significant period of time, people and authorities still feel the need for awareness due to the mutating nature of the virus and therefore varying symptoms and prevention strategies. People and authorities resort to social media platforms the most to share awareness information and voice out their opinions due to their massive outreach in spreading the word in practically no time. People use a number of languages to communicate over social media platforms based on their familiarity, language outreach, and availability on social media platforms. The entire world has been hit by the coronavirus and India is the second worst-hit country in terms of the number of active coronavirus cases. India, being a multilingual country, offers a great opportunity to study the outreach of various languages that have been actively used across social media platforms. In this study, we aim to study the dataset related to COVID-19 collected in the period between February 2020 to July 2020 specifically for regional languages in India. This could be helpful for the Government of India, various state governments, NGOs, researchers, and policymakers in studying different issues related to the pandemic. We found that English has been the mode of communication in over 64% of tweets while as many as twelve regional languages in India account for approximately 4.77% of tweets.
引用
收藏
页码:309 / 324
页数:16
相关论文
共 50 条
  • [41] COVID-19 and Indian Pediatrics
    Mishra, Devendra
    INDIAN PEDIATRICS, 2020, 57 (04) : 287 - 287
  • [42] COVID-19 vaccination policy dataset
    Attwell, Katie
    NATURE HUMAN BEHAVIOUR, 2023, 7 (08) : 1247 - 1248
  • [43] Design and analysis of a large-scale COVID-19 tweets dataset
    Rabindra Lamsal
    Applied Intelligence, 2021, 51 : 2790 - 2804
  • [44] Design and analysis of a large-scale COVID-19 tweets dataset
    Lamsal, Rabindra
    APPLIED INTELLIGENCE, 2021, 51 (05) : 2790 - 2804
  • [45] COVID-19 Mythbusters in World Languages
    Ashida, Mana
    Kim, Jin-Dong
    Lee, Seunghun J.
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3048 - 3055
  • [46] Regional Implications of COVID-19
    Bourdin, Sebastien
    Levratto, Nadine
    INTERNATIONAL REGIONAL SCIENCE REVIEW, 2023, 46 (5-6) : 515 - 522
  • [47] ISARIC-COVID-19 dataset: A Prospective, Standardized, Global Dataset of Patients Hospitalized with COVID-19
    Garcia-Gallo, Esteban
    Merson, Laura
    Kennon, Kalynn
    Kelly, Sadie
    Citarella, Barbara Wanjiru
    Fryer, Daniel Vidali
    Shrapnel, Sally
    Lee, James
    Duque, Sara
    Fuentes, Yuli V.
    Balan, Valeria
    Smith, Sue
    Wei, Jia
    Goncalves, Bronner P.
    Russell, Clark D.
    Sigfrid, Louise
    Dagens, Andrew
    Olliaro, Piero L.
    Baruch, Joaquin
    Kartsonaki, Christiana
    Dunning, Jake
    Rojek, Amanda
    Rashan, Aasiyah
    Beane, Abi
    Murthy, Srinivas
    Reyes, Luis Felipe
    SCIENTIFIC DATA, 2022, 9 (01)
  • [48] ISARIC-COVID-19 dataset: A Prospective, Standardized, Global Dataset of Patients Hospitalized with COVID-19
    Esteban Garcia-Gallo
    Laura Merson
    Kalynn Kennon
    Sadie Kelly
    Barbara Wanjiru Citarella
    Daniel Vidali Fryer
    Sally Shrapnel
    James Lee
    Sara Duque
    Yuli V. Fuentes
    Valeria Balan
    Sue Smith
    Jia Wei
    Bronner P. Gonçalves
    Clark D. Russell
    Louise Sigfrid
    Andrew Dagens
    Piero L. Olliaro
    Joaquin Baruch
    Christiana Kartsonaki
    Jake Dunning
    Amanda Rojek
    Aasiyah Rashan
    Abi Beane
    Srinivas Murthy
    Luis Felipe Reyes
    Scientific Data, 9
  • [49] The COVID-19 pandemic, Twitter and science communication
    Berro, Maximiliano
    REVISTA MEDICA DEL URUGUAY, 2021, 37 (03):
  • [50] Sentimental Analysis of Twitter Comments on Covid-19
    Raheja, Supriya
    Asthana, Anjani
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 704 - 708