LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language Models

被引：0

作者：

Pan, Rongqi ^{[1
]}

Ghaleb, Taher A. ^{[2
,3
]}

Briand, Lionel C. ^{[4
,5
]}

机构：

[1] Univ Ottawa, Sch EECS, Ottawa, ON K1N 6N5, Canada

[2] Trent Univ, Comp Sci Dept, Peterborough, ON K9L 0G2, Canada

[3] Univ Ottawa, Ottawa, ON K1N 6N5, Canada

[4] Univ Limerick, Lero SFI Ctr Software Res, Limerick V94T9PX, Ireland

[5] Univ Ottawa, Sch EECS, Ottawa, ON K1N 6N5, Canada

来源：

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING | 2024年 / 50卷 / 11期

基金：

加拿大自然科学与工程研究理事会; 爱尔兰科学基金会;

关键词：

Minimization; Codes; Fault detection; Closed box; Scalability; Time measurement; Genetic algorithms; Source coding; Vectors; Unified modeling language; Test suite minimization; test suite reduction; pre-trained language models; genetic algorithm; black-box testing; SELECTION; PRIORITIZATION;

D O I：

10.1109/TSE.2024.3469582

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates (FDR) while the latter is faster. To address scalability while retaining a high FDR, we propose LTM (<bold>L</bold>anguage model-based<bold> </bold>Test suite Minimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time ($41.72\%$41.72% versus $41.02\%$41.02%, on average); (b) attaining a significantly higher fault detection rate ($0.84$0.84 versus $0.81$0.81, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.

引用

页码：3053 / 3070

页数：18

共 50 条

[21] Generative models for similarity-based classification
Cazzanti, Luca
Gupta, Maya R.
Koppal, Anjah J.
PATTERN RECOGNITION, 2008, 41 (07) : 2289 - 2297
[22] Similarity-based heterogeneous neuron models
Muñoz, LAB
ECAI 2000: 14TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2000, 54 : 271 - 275
[23] Iterative Kriging-Based Methods for Expensive Black-Box Models
Deng, Siyang
El Bechari, Reda
Brisset, Stephane
Clenet, Stephane
IEEE TRANSACTIONS ON MAGNETICS, 2018, 54 (03)
[24] Trajectory Optimization for Falsification: A Case Study of Vehicle Rollover Test Generation Based on Black-box Models
Tang, Sunbochen
Li, Nan
Kolmanovsky, Ilya
Girard, Anouck
IFAC PAPERSONLINE, 2020, 53 (02): : 14279 - 14284
[25] Universal Certified Defense for Black-Box Models Based on Random Smoothing
Li Q.
Chen J.
Zhang Z.-J.
He K.
Du R.-Y.
Wang X.-X.
Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (03): : 690 - 702
[26] Flakify: A Black-Box, Language Model-Based Predictor for Flaky Tests
Fatima, Sakina
Ghaleb, Taher A.
Briand, Lionel
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (04) : 1912 - 1927
[27] Open Sesame! Universal Black-Box Jailbreaking of Large Language Models
Lapid, Raz
Langberg, Ron
Sipper, Moshe
APPLIED SCIENCES-BASEL, 2024, 14 (16):
[28] TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models
Xue, Jiaqi
Zheng, Mengxin
Hua, Ting
Shen, Yilin
Liu, Yepeng
Boloni, Ladislau
Lou, Qian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[29] Test case generation based on orthogonal table for software black-box testing
Liu, Jiu-Fu
Yang, Zhong
Yang, Zhen-Xing
Sun, Lin
Journal of Harbin Institute of Technology (New Series), 2008, 15 (03) : 365 - 368
[30] Similarity-based assumptions and URL test results
Kesseru, Z
Majoros, G
TOPSEAL '96 -INTERNATIONAL TOPICAL MEETING: DEMONSTRATING THE PRACTICAL ACHIEVEMENTS OF NUCLEAR WASTE MANAGEMENT AND DISPOSAL, VOL II: POSTER PAPERS, 1996, : 192 - 195

← 1 2 3 4 5 →