Assessing the performance of AI chatbots in answering patients' common questions about low back pain

被引:1
|
作者
Scaff, Simone P. S. [1 ]
Reis, Felipe J. J. [2 ,3 ]
Ferreira, Giovanni E. [4 ]
Jacob, Maria Fernanda [1 ]
Saragiotto, Bruno T. [1 ,5 ]
机构
[1] Univ Cidade Sao Paulo, Masters & Doctoral Programs Phys Therapy, Sao Paulo, Brazil
[2] Inst Fed Rio de Janeiro, Phys Therapy Dept, Rio De Janeiro, Brazil
[3] Vrije Univ Brussel, Dept Physiotherapy Human Physiol & Anat, Brussels, Belgium
[4] Univ Sydney, Inst Musculoskeletal Hlth, Sydney, NSW, Australia
[5] Univ Technol, Fac Hlth, Grad Sch Hlth, Discipline Physiotherapy, Sydney, NSW, Australia
关键词
Low Back Pain; Internet; Pain; HEALTH; CARE; INFORMATION; GUIDELINES;
D O I
10.1136/ard-2024-226202
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objectives: The aim of this study was to assess the accuracy and readability of the answers generated by large language model (LLM)-chatbots to common patient questions about low back pain (LBP). Methods: This cross-sectional study analysed responses to 30 LBP-related questions, covering self-management, risk factors and treatment. The questions were developed by experienced clinicians and researchers and were piloted with a group of consumer representatives with lived experience of LBP. The inquiries were inputted in prompt form into ChatGPT 3.5, Bing, Bard (Gemini) and ChatGPT 4.0. Responses were evaluated in relation to their accuracy, readability and presence of disclaimers about health advice. The accuracy was assessed by comparing the recommendations generated with the main guidelines for LBP. The responses were analysed by two independent reviewers and classified as accurate, inaccurate or unclear. Readability was measured with the Flesch Reading Ease Score (FRES). Results: Out of 120 responses yielding 1069 recommendations, 55.8% were accurate, 42.1% inaccurate and 1.9% unclear. Treatment and self-management domains showed the highest accuracy while risk factors had the most inaccuracies. Overall, LLM-chatbots provided answers that were 'reasonably difficult' to read, with a mean (SD) FRES score of 50.94 (3.06). Disclaimer about health advice was present around 70%-100% of the responses produced. Conclusions: The use of LLM-chatbots as tools for patient education and counselling in LBP shows promising but variable results. These chatbots generally provide moderately accurate recommendations. However, the accuracy may vary depending on the topic of each question. The reliability level of the answers was inadequate, potentially affecting the patient's ability to comprehend the information.
引用
收藏
页码:143 / 149
页数:7
相关论文
共 50 条
  • [31] Significance of Physical Performance Tests for Patients with Low Back Pain
    Pfingsten, Michael
    Lueder, Susanne
    Luedtke, Kerstin
    Petzke, Frank
    Hildebrandt, Jan
    PAIN MEDICINE, 2014, 15 (07) : 1211 - 1221
  • [32] Evaluation of ChatGPT-4 Performance in Answering Patients' Questions About the Management of Type 2 Diabetes
    Gokbulut, Puren
    Kuskonmaz, Serife Mehlika
    Onder, Cagatay Emir
    Taskaldiran, Isilay
    Koc, Gonul
    MEDICAL BULLETIN OF SISLI ETFAL HOSPITAL, 2024, 58 (04): : 483 - 490
  • [33] DEVELOPMENT OF AN OBSERVATION METHOD FOR ASSESSING PAIN BEHAVIOR IN CHRONIC LOW-BACK-PAIN PATIENTS
    KEEFE, FJ
    BLOCK, AR
    BEHAVIOR THERAPY, 1982, 13 (04) : 363 - 375
  • [34] Are 2 Questions Enough to Screen for Depression and Anxiety in Patients With Chronic Low Back Pain?
    Reme, Silje Endresen
    Lie, Stein Atle
    Eriksen, Hege R.
    SPINE, 2014, 39 (07) : E455 - E462
  • [35] Evaluating ChatGPT's performance in answering common patient questions on cervical cancer
    Do, Anthony
    Li, Andrew
    Smith, Haller
    Chambers, Laura
    Esselen, Kate
    Liang, Margaret
    GYNECOLOGIC ONCOLOGY, 2024, 190 : S376 - S376
  • [36] Parenthood With Kidney Failure: Answering Questions Patients Ask About Pregnancy
    Jesudason, Shilpanjali
    Williamson, Amber
    Huuskes, Brooke
    Hewawasam, Erandi
    KIDNEY INTERNATIONAL REPORTS, 2022, 7 (07): : 1477 - 1492
  • [37] Higher Incidence of Common Diagnoses in Patients with Low Back Pain in Primary Care
    Bartholomeeusen, Stefaan
    Van Zundert, Jan
    Truyers, Carla
    Buntinx, Frank
    Paulus, Dominique
    PAIN PRACTICE, 2012, 12 (01) : 1 - 6
  • [38] Common Nonmalignant Causes of Low Back Pain in Patients with Serious Illness #431
    Spickler, Michael
    Smith, Sean
    JOURNAL OF PALLIATIVE MEDICINE, 2022, 25 (02) : 327 - 328
  • [39] Neural Networks for Aircraft Trajectory Prediction: Answering Open Questions About Their Performance
    Ayala, Daniel
    Ayala, Rafael
    Vidal, Lara Selles
    Hernandez, Inma
    Ruiz, David
    IEEE ACCESS, 2023, 11 : 26593 - 26610
  • [40] Battle of the bots: a comparative analysis of generative ai responses from leading chatbots to patient questions about endometriosis
    Cohen, N.
    Kho, K.
    Smith, K.
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2024, 230 (04) : S1170 - S1170