Trend and Co-occurrence Network of COVID-19 Symptoms From Large-Scale Social Media Data: Infoveillance Study

被引:6
|
作者
Wu, Jiageng [1 ,2 ,3 ]
Wang, Lumin [1 ,2 ,3 ]
Hua, Yining [4 ,5 ]
Li, Minghui [1 ,2 ,3 ]
Zhou, Li [4 ,5 ]
Bates, David W. [4 ,5 ]
Yang, Jie [1 ,2 ,3 ]
机构
[1] Zhejiang Univ, Sch Publ Hlth, Sch Med, 866 Yuhangtang Rd, Hangzhou 310058, Peoples R China
[2] Zhejiang Univ, Affiliated Hosp 2, Sch Med, 866 Yuhangtang Rd, Hangzhou 310058, Peoples R China
[3] Key Lab Intelligent Prevent Med Zhejiang Prov, Hangzhou, Peoples R China
[4] Harvard Med Sch, Dept Biomed Informat, Boston, MA USA
[5] Brigham & Womens Hosp, Div Gen Internal Med & Primary Care, Boston, MA USA
基金
美国国家卫生研究院;
关键词
social media; network analysis; public health; data mining; COVID-19;
D O I
10.2196/45419
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: For an emergent pandemic, such as COVID-19, the statistics of symptoms based on hospital data may be biased or delayed due to the high proportion of asymptomatic or mild-symptom infections that are not recorded in hospitals. Meanwhile, the difficulty in accessing large-scale clinical data also limits many researchers from conducting timely research. Objective: Given the wide coverage and promptness of social media, this study aimed to present an efficient workflow to track and visualize the dynamic characteristics and co-occurrence of symptoms for the COVID-19 pandemic from large-scale and long-term social media data. Methods: This retrospective study included 471,553,966 COVID-19-related tweets from February 1, 2020, to April 30, 2022. We curated a hierarchical symptom lexicon for social media containing 10 affected organs/systems, 257 symptoms, and 1808 synonyms. The dynamic characteristics of COVID-19 symptoms over time were analyzed from the perspectives of weekly new cases, overall distribution, and temporal prevalence of reported symptoms. The symptom evolutions between virus strains (Delta and Omicron) were investigated by comparing the symptom prevalence during their dominant periods. A co-occurrence symptom network was developed and visualized to investigate inner relationships among symptoms and affected body systems. Results: This study identified 201 COVID-19 symptoms and grouped them into 10 affected body systems. There was a significant correlation between the weekly quantity of self-reported symptoms and new COVID-19 infections (Pearson correlation coefficient=0.8528; P<.001). We also observed a 1-week leading trend (Pearson correlation coefficient=0.8802; P<.001) between them. The frequency of symptoms showed dynamic changes as the pandemic progressed, from typical respiratory symptoms in the early stage to more musculoskeletal and nervous symptoms in the later stages. We identified the difference in symptoms between the Delta and Omicron periods. There were fewer severe symptoms (coma and dyspnea), more flu-like symptoms (throat pain and nasal congestion), and fewer typical COVID symptoms (anosmia and taste altered) in the Omicron period than in the Delta period (all P<.001). Network analysis revealed co-occurrences among symptoms and systems corresponding to specific disease progressions, including palpitations (cardiovascular) and dyspnea (respiratory), and alopecia (musculoskeletal) and impotence (reproductive). Conclusions: This study identified more and milder COVID-19 symptoms than clinical research and characterized the dynamic symptom evolution based on 400 million tweets over 27 months. The symptom network revealed potential comorbidity risk and prognostic disease progression. These findings demonstrate that the cooperation of social media and a well-designed workflow can depict a holistic picture of pandemic symptoms to complement clinical studies.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis
    Naseem, Usman
    Razzak, Imran
    Khushi, Matloob
    Eklund, Peter W.
    Kim, Jinman
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2021, 8 (04): : 1003 - 1015
  • [32] Constructing co-occurrence network embeddings to assist association extraction for COVID-19 and other coronavirus infectious diseases
    Oniani, David
    Jiang, Guoqian
    Liu, Hongfang
    Shen, Feichen
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (08) : 1259 - 1267
  • [33] Considering social inequalities in health in large-scale testing for COVID-19 in Montréal: a qualitative case study
    Marie-Catherine Gagnon-Dufresne
    Lara Gautier
    Camille Beaujoin
    Ashley Savard Lamothe
    Rachel Mikanagu
    Patrick Cloos
    Valéry Ridde
    Kate Zinszer
    BMC Public Health, 22
  • [34] Effects of a large-scale social media advertising campaign on holiday travel and COVID-19 infections: a cluster randomized controlled trial
    Breza, Emily
    Stanford, Fatima Cody
    Alsan, Marcella
    Alsan, Burak
    Banerjee, Abhijit
    Chandrasekhar, Arun G.
    Eichmeyer, Sarah
    Glushko, Traci
    Goldsmith-Pinkham, Paul
    Holland, Kelly
    Hoppe, Emily
    Karnani, Mohit
    Liegl, Sarah
    Loisel, Tristan
    Ogbu-Nwobodo, Lucy
    Olken, Benjamin A.
    Torres, Carlos
    Vautrey, Pierre-Luc
    Warner, Erica T.
    Wootton, Susan
    Duflo, Esther
    NATURE MEDICINE, 2021, 27 (09) : 1622 - +
  • [35] Effects of a large-scale social media advertising campaign on holiday travel and COVID-19 infections: a cluster randomized controlled trial
    Emily Breza
    Fatima Cody Stanford
    Marcella Alsan
    Burak Alsan
    Abhijit Banerjee
    Arun G. Chandrasekhar
    Sarah Eichmeyer
    Traci Glushko
    Paul Goldsmith-Pinkham
    Kelly Holland
    Emily Hoppe
    Mohit Karnani
    Sarah Liegl
    Tristan Loisel
    Lucy Ogbu-Nwobodo
    Benjamin A. Olken
    Carlos Torres
    Pierre-Luc Vautrey
    Erica T. Warner
    Susan Wootton
    Esther Duflo
    Nature Medicine, 2021, 27 : 1622 - 1628
  • [36] Identification of Risk Factors and Symptoms of COVID-19: Analysis of Biomedical Literature and Social Media Data
    Jeon, Jouhyun
    Baruah, Gaurav
    Sarabadani, Sarah
    Palanica, Adam
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (10)
  • [37] Text filtering for harmful document classification method using three words co-occurrence and large-scale data processing
    Otsuka, Takanobu
    Deng, Deyue
    Ito, Takayuki
    Otsuka, T. (otsuka.takanobu@nitech.ac.jp), 1600, Institute of Electrical Engineers of Japan (134): : 168 - 175
  • [38] Does personality predict responses to the COVID-19 crisis? Evidence from a prospective large-scale study
    Rammstedt, Beatrice
    Lechner, Clemens M.
    Weiss, Bernd
    EUROPEAN JOURNAL OF PERSONALITY, 2022, 36 (01) : 47 - 60
  • [39] Text Filtering for Harmful Document Classification Using Three-Word Co-Occurrence and Large-Scale Data Processing
    Otsuka, Takanobu
    Deng, Deyue
    Ito, Takayuki
    ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2015, 98 (10) : 31 - 40
  • [40] Can COVID-19 symptoms as reported in a large-scale online survey be used to optimise spatial predictions of COVID-19 incidence risk in Belgium?
    Neyens, Thomas
    Faes, Christel
    Vranckx, Maren
    Pepermans, Koen
    Hens, Niel
    Van Damme, Pierre
    Molenberghs, Geert
    Aerts, Jan
    Beutels, Philippe
    SPATIAL AND SPATIO-TEMPORAL EPIDEMIOLOGY, 2020, 35