Can GPT-4 Aid in Detecting Ambiguities, Inconsistencies, and Incompleteness in Requirements Analysis? A Comprehensive Case Study

被引:0
|
作者
Mahbub, Taslim [1 ]
Dghaym, Dana [1 ]
Shankarnarayanan, Aadhith [1 ]
Syed, Taufiq [1 ]
Shapsough, Salsabeel [1 ]
Zualkernan, Imran [1 ]
机构
[1] Amer Univ Sharjah, Dept Comp Sci & Engn, Sharjah, U Arab Emirates
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Stakeholders; Large language models; Accuracy; Requirements engineering; Defect detection; Software development management; Software engineering; Ambiguity; completeness; GPT; inconsistency; large language models (LLMs); requirements engineering; software engineering; software requirements specifications (SRS); LARGE LANGUAGE MODELS; VALIDATION;
D O I
10.1109/ACCESS.2024.3464242
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective software projects hinge on robust requirements, yet flawed requirements often lead to costly delays and revisions. While tools have been developed to identify defects in Software Requirements Specifications (SRS), the advent of Large Language Models (LLMs) like GPT-4 presents new opportunities for enhancing requirements quality. However, the potential of LLMs in this realm remains largely unexplored, particularly in the context of large-scale industrial documents. To bridge this gap, we investigate the efficacy of zero-shot GPT-4 in various requirements analysis tasks using an industrial software specification document. Our study evaluates LLM performance in detecting defects, such as ambiguities, inconsistencies, and incompleteness, while also analyzing GPT-4's ability to identify issues across version iterations and support technical experts in requirements analysis. Qualitatively, we identify key limitations of LLMs in defect detection, notably their inability to cross-reference throughout the document and their constrained understanding of specialized contexts. Quantitatively, we find that while LLMs excel in identifying incomplete requirements (precision 0.61), their performance is less impressive in detecting inconsistencies (precision 0.43) and ambiguities (precision 0.39). Although GPT-4 demonstrates promise in automating early defect detection across versions and providing accurate technical answers, our results underscore that they cannot entirely replace human analysts due to their lack of nuanced domain knowledge in a zero-shot setting. Nevertheless, avenues like few-shot learning and complex prompt design offer the potential to enhance LLM precision in defect detection.
引用
收藏
页码:171972 / 171992
页数:21
相关论文
共 19 条
  • [1] Case study identification with GPT-4 and implications for mapping studies
    Petersen, Kai
    INFORMATION AND SOFTWARE TECHNOLOGY, 2024, 171
  • [2] What can GPT-4 do for Diagnosing Rare Eye Diseases? A Pilot Study
    Hu, Xiaoyan
    Ran, An Ran
    Nguyen, Truong X.
    Szeto, Simon
    Yam, Jason C.
    Chan, Carmen K. M.
    Cheung, Carol Y.
    OPHTHALMOLOGY AND THERAPY, 2023, 12 (06) : 3395 - 3402
  • [3] Can artificial intelligence diagnose seizures based on patients' descriptions? A study of GPT-4
    Ford, Joseph
    Pevy, Nathan
    Grunewald, Richard
    Howell, Stephen
    Reuber, Markus
    EPILEPSIA, 2025,
  • [4] What can GPT-4 do for Diagnosing Rare Eye Diseases? A Pilot Study
    Xiaoyan Hu
    An Ran Ran
    Truong X. Nguyen
    Simon Szeto
    Jason C. Yam
    Carmen K. M. Chan
    Carol Y. Cheung
    Ophthalmology and Therapy, 2023, 12 : 3395 - 3402
  • [5] CAN GPT-4 BE A VIABLE ALTERNATIVE FOR DISCUSSING COMPLEX CASES IN DIGITAL ORAL RADIOLOGY? A CRITICAL ANALYSIS
    Santana, Lucas Alves da Mota
    Floresta, Lara Gois
    Alves, Emilly Victoria Maciel
    dos Santos, Marcos Antonio Lima
    Barbosa, Breno Ferreira
    Vasconcellos, Sara Juliana de Abreu de
    Valadares, Carolina Vieira
    EXCLI JOURNAL, 2023, 22 : 749 - 751
  • [6] Evaluating GPT-4 as an academic support tool for clinicians: A comparative analysis of case records from the literature
    Fonseca Magalhaes Filho, M. A.
    Aguiar Junior, P. N.
    Fabre, B. L.
    Marques, F.
    Gutierres, B.
    William Junior, W. Nassib
    Del Giglio, A.
    ANNALS OF ONCOLOGY, 2023, 34 : S729 - S729
  • [7] Whodunit: Classifying Code as Human Authored or GPT-4 generated- A case study on CodeChef problems
    Idialu, Oseremen Joy
    Mathews, Noble Saji
    Maipradit, Rungroj
    Atlee, Joanne M.
    Nagappan, Meiyappan
    2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 394 - 406
  • [8] Exploring the potential of GPT-4 as an interactive transcreation assistant in game localisation: A case study on the translation of Pokémon names
    Garcia, Luis Damian Moreno
    Mangiron, Carme
    PERSPECTIVES-STUDIES IN TRANSLATION THEORY AND PRACTICE, 2024,
  • [9] Creating Terminological Correspondence Recognition Tests with GPT-4: A Case Study in English-to-Turkish Translations in the Engineering Domain
    Sanchez-Torron, Marina
    Ipek, Egemen
    Raido, Vanessa Enriquez
    INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION, 2025,
  • [10] Navigating Nephrology's Decline Through a GPT-4 Analysis of Internal Medicine Specialties in the United States: Qualitative Study
    Miao, Jing
    Thongprayoon, Charat
    Valencia, Oscar Garcia
    Craici, Iasmina M.
    Cheungpasitporn, Wisit
    JMIR MEDICAL EDUCATION, 2024, 10