Assessing the performance of AI chatbots in answering patients' common questions about low back pain

被引:1
|
作者
Scaff, Simone P. S. [1 ]
Reis, Felipe J. J. [2 ,3 ]
Ferreira, Giovanni E. [4 ]
Jacob, Maria Fernanda [1 ]
Saragiotto, Bruno T. [1 ,5 ]
机构
[1] Univ Cidade Sao Paulo, Masters & Doctoral Programs Phys Therapy, Sao Paulo, Brazil
[2] Inst Fed Rio de Janeiro, Phys Therapy Dept, Rio De Janeiro, Brazil
[3] Vrije Univ Brussel, Dept Physiotherapy Human Physiol & Anat, Brussels, Belgium
[4] Univ Sydney, Inst Musculoskeletal Hlth, Sydney, NSW, Australia
[5] Univ Technol, Fac Hlth, Grad Sch Hlth, Discipline Physiotherapy, Sydney, NSW, Australia
关键词
Low Back Pain; Internet; Pain; HEALTH; CARE; INFORMATION; GUIDELINES;
D O I
10.1136/ard-2024-226202
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objectives: The aim of this study was to assess the accuracy and readability of the answers generated by large language model (LLM)-chatbots to common patient questions about low back pain (LBP). Methods: This cross-sectional study analysed responses to 30 LBP-related questions, covering self-management, risk factors and treatment. The questions were developed by experienced clinicians and researchers and were piloted with a group of consumer representatives with lived experience of LBP. The inquiries were inputted in prompt form into ChatGPT 3.5, Bing, Bard (Gemini) and ChatGPT 4.0. Responses were evaluated in relation to their accuracy, readability and presence of disclaimers about health advice. The accuracy was assessed by comparing the recommendations generated with the main guidelines for LBP. The responses were analysed by two independent reviewers and classified as accurate, inaccurate or unclear. Readability was measured with the Flesch Reading Ease Score (FRES). Results: Out of 120 responses yielding 1069 recommendations, 55.8% were accurate, 42.1% inaccurate and 1.9% unclear. Treatment and self-management domains showed the highest accuracy while risk factors had the most inaccuracies. Overall, LLM-chatbots provided answers that were 'reasonably difficult' to read, with a mean (SD) FRES score of 50.94 (3.06). Disclaimer about health advice was present around 70%-100% of the responses produced. Conclusions: The use of LLM-chatbots as tools for patient education and counselling in LBP shows promising but variable results. These chatbots generally provide moderately accurate recommendations. However, the accuracy may vary depending on the topic of each question. The reliability level of the answers was inadequate, potentially affecting the patient's ability to comprehend the information.
引用
收藏
页码:143 / 149
页数:7
相关论文
共 50 条
  • [41] Agreement among physiotherapists in assessing patient performance of exercises for low-back pain
    Aurore Hermet
    Alexandra Roren
    Marie-Martine Lefevre-Colau
    Adrien Gautier
    Jonathan Linieres
    Serge Poiraudeau
    Clémence Palazzo
    BMC Musculoskeletal Disorders, 19
  • [42] Negative beliefs about low back pain are associated with persistent high intensity low back pain
    Ng, Sin Ki
    Cicuttini, Flavia M.
    Wang, Yuanyuan
    Wluka, Anita
    Fitzgibbon, Bernadette
    Urquhart, Donna M.
    PSYCHOLOGY HEALTH & MEDICINE, 2017, 22 (07) : 790 - 799
  • [43] Agreement among physiotherapists in assessing patient performance of exercises for low-back pain
    Hermet, Aurore
    Roren, Alexandra
    Lefevre-Colau, Marie-Martine
    Gautier, Adrien
    Linieres, Jonathan
    Poiraudeau, Serge
    Palazzo, Clemence
    BMC MUSCULOSKELETAL DISORDERS, 2018, 19
  • [44] Common misconceptions about back pain in sport: Tiger Woods' case brings five fundamental questions into sharp focus
    O'Sullivan, Peter
    BRITISH JOURNAL OF SPORTS MEDICINE, 2015, 49 (14) : 905 - +
  • [45] An exploration of experiences and beliefs about low back pain with Arab Muslim patients
    Maki, Dana
    Lempp, Heidi
    Critchley, Duncan
    DISABILITY AND REHABILITATION, 2022, 44 (18) : 5171 - 5183
  • [46] Pain measurement in patients with low back pain
    Mannion, Anne F.
    Balague, Federico
    Pellise, Ferran
    Cedraschi, Christine
    NATURE CLINICAL PRACTICE RHEUMATOLOGY, 2007, 3 (11): : 610 - 618
  • [47] Pain measurement in patients with low back pain
    Anne F Mannion
    Federico Balagué
    Ferran Pellisé
    Christine Cedraschi
    Nature Clinical Practice Rheumatology, 2007, 3 : 610 - 618
  • [48] Assessing GPT-4's accuracy in answering clinical pharmacological questions on pain therapy
    Stroop, Anna
    Stroop, Tabea
    Alsofy, Samer Zawy
    Wegner, Moritz
    Nakamura, Makoto
    Stroop, Ralf
    BRITISH JOURNAL OF CLINICAL PHARMACOLOGY, 2025,
  • [49] Performance of Artificial Intelligence Chatbots in Answering Clinical Questions on Japanese Practical Guidelines for Implant-based Breast Reconstruction
    Shiraishi, Makoto
    Sowa, Yoshihiro
    Tomita, Koichi
    Terao, Yasunobu
    Satake, Toshihiko
    Muto, Mayu
    Morita, Yuhei
    Higai, Shino
    Toyohara, Yoshihiro
    Kurokawa, Yasue
    Sunaga, Ataru
    Okazaki, Mutsumi
    AESTHETIC PLASTIC SURGERY, 2024,
  • [50] Development and validation of a questionnaire assessing volitional competencies to enhance the performance of physical activities in chronic low back pain patients
    Mathy, Celine
    Broonen, Jean-Paul
    Henrotin, Yves
    Marty, Marc
    Legout, Valerie
    Genevay, Stephane
    Duplan, Bernard
    Bazin, Thierry
    Laroche, Francoise
    Savarieau, Bernard
    Cedraschi, Christine
    BMC MUSCULOSKELETAL DISORDERS, 2011, 12