Assessing the performance of AI chatbots in answering patients' common questions about low back pain

被引:1
|
作者
Scaff, Simone P. S. [1 ]
Reis, Felipe J. J. [2 ,3 ]
Ferreira, Giovanni E. [4 ]
Jacob, Maria Fernanda [1 ]
Saragiotto, Bruno T. [1 ,5 ]
机构
[1] Univ Cidade Sao Paulo, Masters & Doctoral Programs Phys Therapy, Sao Paulo, Brazil
[2] Inst Fed Rio de Janeiro, Phys Therapy Dept, Rio De Janeiro, Brazil
[3] Vrije Univ Brussel, Dept Physiotherapy Human Physiol & Anat, Brussels, Belgium
[4] Univ Sydney, Inst Musculoskeletal Hlth, Sydney, NSW, Australia
[5] Univ Technol, Fac Hlth, Grad Sch Hlth, Discipline Physiotherapy, Sydney, NSW, Australia
关键词
Low Back Pain; Internet; Pain; HEALTH; CARE; INFORMATION; GUIDELINES;
D O I
10.1136/ard-2024-226202
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objectives: The aim of this study was to assess the accuracy and readability of the answers generated by large language model (LLM)-chatbots to common patient questions about low back pain (LBP). Methods: This cross-sectional study analysed responses to 30 LBP-related questions, covering self-management, risk factors and treatment. The questions were developed by experienced clinicians and researchers and were piloted with a group of consumer representatives with lived experience of LBP. The inquiries were inputted in prompt form into ChatGPT 3.5, Bing, Bard (Gemini) and ChatGPT 4.0. Responses were evaluated in relation to their accuracy, readability and presence of disclaimers about health advice. The accuracy was assessed by comparing the recommendations generated with the main guidelines for LBP. The responses were analysed by two independent reviewers and classified as accurate, inaccurate or unclear. Readability was measured with the Flesch Reading Ease Score (FRES). Results: Out of 120 responses yielding 1069 recommendations, 55.8% were accurate, 42.1% inaccurate and 1.9% unclear. Treatment and self-management domains showed the highest accuracy while risk factors had the most inaccuracies. Overall, LLM-chatbots provided answers that were 'reasonably difficult' to read, with a mean (SD) FRES score of 50.94 (3.06). Disclaimer about health advice was present around 70%-100% of the responses produced. Conclusions: The use of LLM-chatbots as tools for patient education and counselling in LBP shows promising but variable results. These chatbots generally provide moderately accurate recommendations. However, the accuracy may vary depending on the topic of each question. The reliability level of the answers was inadequate, potentially affecting the patient's ability to comprehend the information.
引用
收藏
页码:143 / 149
页数:7
相关论文
共 50 条
  • [21] Patients' attitudes and beliefs about back pain and its management after physiotherapy for low back pain
    May, Stephen
    PHYSIOTHERAPY RESEARCH INTERNATIONAL, 2007, 12 (03) : 126 - 135
  • [22] Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma
    Yeo, Yee Hui
    Yang, Ju Dong
    CLINICAL AND MOLECULAR HEPATOLOGY, 2024, 30 (01)
  • [23] ASSESSING CHANGE OVER TIME IN PATIENTS WITH LOW-BACK-PAIN
    STRATFORD, PW
    BINKLEY, J
    SOLOMON, P
    GILL, C
    FINCH, E
    PHYSICAL THERAPY, 1994, 74 (06): : 528 - 533
  • [24] Assessing Dimensionality and Responsiveness of Outcomes Measures for Patients with Low Back Pain
    Cleland, Josh
    Gillani, Rabya
    Bienen, E. Jay
    Sadosky, Alesia
    PAIN PRACTICE, 2011, 11 (01) : 57 - 69
  • [25] Correlation between pain, disability, and quality of life in patients with common low back pain
    Kovacs, FM
    Abraira, V
    Zamora, J
    del Real, MTG
    Llobera, J
    Fernández, C
    SPINE, 2004, 29 (02) : 206 - 210
  • [26] Functional and psychological evaluation of Tunisian patients with common low back pain
    Bejia, I
    Younes, M
    Zrour, S
    Bayoudh, F
    Touzi, M
    Bergaoui, N
    ANNALS OF THE RHEUMATIC DISEASES, 2004, 63 : 429 - 429
  • [27] ASSESSING IMPROVEMENT OF LOW-BACK-PAIN
    CARSON, MEB
    BRITISH MEDICAL JOURNAL, 1980, 280 (6207): : 111 - 111
  • [28] Common low back pain, is it really a mystery?
    Larsen, Kjetil
    ANAESTHESIA PAIN & INTENSIVE CARE, 2018, 22 (01) : 125 - 130
  • [29] CYTOKINE BIOMARKERS IN COMMON LOW BACK PAIN
    Dhahri, R.
    Dghaies, A.
    Slouma, M.
    Metoui, L.
    Gharsallah, I.
    Dorgham, I.
    Ayari, R.
    Mallat, Y.
    Amri, K.
    Tezeghdenti, A.
    Dkhili, W.
    Kochkar, R.
    Ghazouani, E.
    ANNALS OF THE RHEUMATIC DISEASES, 2021, 80 : 926 - 926
  • [30] AI chatbots show promise but limitations on UK medical exam questions: a comparative performance study
    Sadeq, Mohammed Ahmed
    Ghorab, Reem Mohamed Farouk
    Ashry, Mohamed Hady
    Abozaid, Ahmed Mohamed
    Banihani, Haneen A.
    Salem, Moustafa
    Aisheh, Mohammed Tawfiq Abu
    Abuzahra, Saad
    Mourid, Marina Ramzy
    Assker, Mohamad Monif
    Ayyad, Mohammed
    Moawad, Mostafa Hossam El Din
    SCIENTIFIC REPORTS, 2024, 14 (01):