Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa

被引:3
|
作者
Mapundu, Michael T. [1 ]
Kabudula, Chodziwadziwa W. [1 ,2 ]
Musenge, Eustasius [1 ]
Olago, Victor [3 ]
Celik, Turgay [4 ,5 ]
机构
[1] Univ Witwatersrand, Sch Publ Hlth, Dept Epidemiol & Biostat, Johannesburg, South Africa
[2] Univ Witwatersrand, Wits Rural Publ Hlth & Hlth Transit Res Unit Aginc, MRC, Johannesburg, South Africa
[3] Natl Hlth Lab Serv NHLS, Natl Canc Registry, Johannesburg, South Africa
[4] Univ Witwatersrand, Wits Inst Data Sci, Johannesburg, South Africa
[5] Univ Witwatersrand, Sch Elect & Informat Engn, Johannesburg, South Africa
基金
英国惠康基金;
关键词
cause of death; machine learning; Verbal Autopsy; CCVA; algorithms; DATA-DERIVED ALGORITHMS;
D O I
10.3389/fpubh.2022.990838
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Computer Coded Verbal Autopsy (CCVA) algorithms are commonly used to determine the cause of death (CoD) from questionnaire responses extracted from verbal autopsies (VAs). However, they can only operate on structured data and cannot effectively harness information from unstructured VA narratives. Machine Learning (ML) algorithms have also been applied successfully in determining the CoD from VA narratives, allowing the use of auxiliary information that CCVA algorithms cannot directly utilize. However, most ML-based studies only use responses from the structured questionnaire, and the results lack generalisability and comparability across studies. We present a comparative performance evaluation of ML methods and CCVA algorithms on South African VA narratives data, using data from Agincourt Health and Demographic Surveillance Site (HDSS) with physicians' classifications as the gold standard. The data were collected from 1993 to 2015 and have 16,338 cases. The random forest and extreme gradient boosting classifiers outperformed the other classifiers on the combined dataset, attaining accuracy of 96% respectively, with significant statistical differences in algorithmic performance (p < 0.0001). All our models attained Area Under Receiver Operating Characteristics (AUROC) of greater than 0.884. The InterVA CCVA attained 83% Cause Specific Mortality Fraction accuracy and an Overall Chance-Corrected Concordance of 0.36. We demonstrate that ML models could accurately determine the cause of death from VA narratives. Additionally, through mortality trends and pattern analysis, we discovered that in the first decade of the civil registration system in South Africa, the average life expectancy was approximately 50 years. However, in the second decade, life expectancy significantly dropped, and the population was dying at a much younger average age of 40 years, mostly from the leading HIV related causes. Interestingly, in the third decade, we see a gradual improvement in life expectancy, possibly attributed to effective health intervention programmes. Through a structure and semantic analysis of narratives where experts disagree, we also demonstrate the most frequent terms of traditional healer consultations and visits. The comparative approach also makes this study a baseline that can be used for future research enforcing generalization and comparability. Future study will entail exploring deep learning models for CoD classification.
引用
收藏
页数:20
相关论文
共 3 条
  • [1] Agreement between cause of death assignment by computer-coded verbal autopsy methods and physician coding of verbal autopsy interviews in South Africa
    Groenewald, Pam
    Thomas, Jason
    Clark, Samuel J.
    Morof, Diane
    Joubert, Jane D.
    Kabudula, Chodziwadziwa
    Li, Zehang
    Bradshaw, Debbie
    GLOBAL HEALTH ACTION, 2023, 16 (01)
  • [2] Effects of recall time on cause-of-death findings using verbal autopsy: empirical evidence from rural South Africa
    Hussain-Alkhateeb L.
    Petzold M.
    Collinson M.
    Tollman S.
    Kahn K.
    Byass P.
    Emerging Themes in Epidemiology, 13 (1):
  • [3] A Comprehensive Analysis of AOD and its Species from Reanalysis Data over the Middle East and North Africa Regions: Evaluation of Model Performance Using Machine Learning Techniques
    Berhane, Samuel Abraham
    Althaf, Pelati
    Kumar, Kanike Raghavendra
    Bu, Lingbing
    Yao, Muxi
    EARTH SYSTEMS AND ENVIRONMENT, 2024,