ChatGPT and Bard Performance on the POSCOMP Exam

被引：0

作者：

Saldanha, Mateus Santos ^{[1
]}

Digiampietri, Luciano Antonio ^{[1
]}

机构：

[1] Univ Sao Paulo, Sao Paulo, SP, Brazil

来源：

PROCEEDINGS OF THE 20TH BRAZILIAN SYMPOSIUM ON INFORMATIONS SYSTEMS, SBSI 2024 | 2024年

关键词：

Large Language Model; ChatBot; Computer Science Examination; ChatGPT; Bard;

D O I：

10.1145/3658271.3658320

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Context: Modern chatbots, built upon advanced language models, have achieved remarkable proficiency in answering questions across diverse fields. Problem: Understanding the capabilities and limitations of these chatbots is a significant challenge, particularly as they are integrated into different information systems, including those in education. Solution: In this study, we conducted a quantitative assessment of the ability of two prominent chatbots, ChatGPT and Bard, to solve POSCOMP questions. IS Theory: The IS theory used in this work is Information processing theory. Method: We used a total of 271 questions from the last five POSCOMP exams that did not rely on graphic content as our materials. We presented these questions to the two chatbots in two formats: directly as they appeared in the exam and with additional context. In the latter case, the chatbots were informed that they were answering a multiple-choice question from a computing exam. Summary of Results: On average, chatbots outperformed human exam-takers by more than 20%. Interestingly, both chatbots performed better, in average, without additional context added to the prompt. They exhibited similar performance levels, with a slight advantage observed for ChatGPT. Contributions and Impact in the IS area: The primary contribution to the field involves the exploration of the capabilities and limitations of chatbots in addressing computing-related questions. This information is valuable for individuals developing Information Systems with the assistance of such chatbots or those relying on technologies built upon these capabilities.

引用

页数：10

共 50 条

[1] A Comparative Analysis of ChatGPT, ChatGPT-4, and Google Bard Performances at the Advanced Burn Life Support Exam
Alessandri-Bonetti, Mario
Liu, Hilary Y.
Donovan, James M.
Ziembicki, Jenny A.
Egro, Francesco M.
JOURNAL OF BURN CARE & RESEARCH, 2024, 45 (04): : 945 - 948
[2] Performance of Google bard and ChatGPT in mass casualty incidents triage
Gan, Rick Kye
Ogbodo, Jude Chukwuebuka
Wee, Yong Zheng
Gan, Ann Zee
Gonzalez, Pedro Arcos
AMERICAN JOURNAL OF EMERGENCY MEDICINE, 2024, 75 : 72 - 78
[3] Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam?
Goetsch Thibaut
Armaghan Dabbagh
Philippe Liverneaux
International Orthopaedics, 2024, 48 : 151 - 158
[4] Does Google's Bard Chatbot perform better than ChatGPT on the European hand surgery exam?
Thibaut, Goetsch
Dabbagh, Armaghan
Liverneaux, Philippe
INTERNATIONAL ORTHOPAEDICS, 2023, 48 (1) : 151 - 158
[5] Performance of Chatgpt in ophthalmology exam; human versus AI
Balci, Ali Safa
Yazar, Zeliha
Ozturk, Banu Turgut
Altan, Cigdem
INTERNATIONAL OPHTHALMOLOGY, 2024, 44 (01)
[6] ChatGPT performance in the medical specialty exam: An observational study
Oztermeli, Ayse Dilara
Oztermeli, Ahmet
MEDICINE, 2023, 102 (32) : E34673
[7] Performance Assessment of ChatGPT versus Bard in Detecting Alzheimer's Dementia
Balamurali, B. T.
Chen, Jer-Ming
DIAGNOSTICS, 2024, 14 (08)
[8] Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control
Wang, Yan
Liang, Lihua
Li, Ran
Wang, Yihua
Hao, Changfu
JOURNAL OF MULTIDISCIPLINARY HEALTHCARE, 2024, 17 : 3917 - 3929
[9] Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions
Fowler, Thomas
Pullen, Simon
Birkett, Liam
BRITISH JOURNAL OF OPHTHALMOLOGY, 2024, 108 (10) : 1379 - 1383
[10] Evaluating human resources management literacy: A performance analysis of ChatGPT and bard
Raman, Raghu
Venugopalan, Murale
Kamal, Anju
HELIYON, 2024, 10 (05)

← 1 2 3 4 5 →