CodeWMBench: An Automated Benchmark for Code Watermarking Evaluation

被引：0

作者：

Wu, Benlong ^{[1
]}

Chen, Kejiang ^{[1
]}

He, Yanru ^{[1
]}

Chen, Guoqiang ^{[1
]}

Zhang, Weiming ^{[1
]}

Yu, Nenghai ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China

来源：

PROCEEDINGS OF THE ACM TURING AWARD CELEBRATION CONFERENCE-CHINA 2024, ACM-TURC 2024 | 2024年

关键词：

Programming language model; code watermark; benchmark; SOFTWARE WATERMARKING;

D O I：

10.1145/3674399.3674447

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As deep learning progresses, programming language generation models such as CodeLlama, GitHub Copilot, and ChatGPT have been widely applied to intelligent code development. However, this also reduces the cost of code plagiarism, posing challenges to copyright and academic integrity. In response to the specific needs for human-machine code detection, this paper introduces a comprehensive automated benchmark CodeWMBench for active detection of human-machine code through watermarking. With a meticulous evaluation of eight code watermarking methods, we demonstrated their performance in terms of harmlessness, robustness, and transparency. Specifically, for the first time, we introduced watermark removal techniques based on large language models and conducted the first assessment of these watermarking methods against code rewriting and retranslating attacks. In the discussion, we delved into the critical issues currently facing code watermarking, including why existing code watermarking methods struggle to resist removal by large language models and potential future methods that could withstand such removals.

引用

页码：120 / 125

页数：6

共 50 条

[1] A public automated web-based evaluation service for watermarking schemes:: StirMark Benchmark
Petitcolas, FAP
Steinebach, M
Raynal, F
Dittmann, J
Fontaine, C
Fatès, N
SECURITY AND WATERMARKING OF MULTIMEDIA CONTENTS III, 2001, 4314 : 575 - 584
[2] DEVELOPMENT AND EVALUATION OF A BENCHMARK TOOL FOR DIGITAL WATERMARKING
Kamiya, Kosuke
Naoe, Kumi
Mori, Takuma
Iwamura, Keiichi
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (3B): : 1305 - 1312
[3] On the Evaluation of Neural Code Translation: Taxonomy and Benchmark
Jiao, Mingsheng
Yu, Tingrui
Li, Xuan
Qiu, Guanjie
Gu, Xiaodong
Shen, Beijun
2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1529 - 1541
[4] GLUECoS : An Evaluation Benchmark for Code-Switched NLP
Khanuja, Simran
Dandapat, Sandipan
Srinivasan, Anirudh
Sitaram, Sunayana
Choudhury, Monojit
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3575 - 3585
[5] A benchmark for medical image watermarking
Navas, K. A.
Sasikumar, M.
Sreevidya, S.
2007 14TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNALS, & IMAGE PROCESSING & EURASIP CONFERENCE FOCUSED ON SPEECH & IMAGE PROCESSING, MULTIMEDIA COMMUNICATIONS & SERVICES, 2007, : 439 - 442
[6] ScenEval: A Benchmark for Scenario-Based Evaluation of Code Generation
Paul, Debalina Ghosh
Zhu, Hong
Bayley, Ian
2024 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST, 2024, : 55 - 63
[7] ScenEval: A Benchmark for Scenario-Based Evaluation of Code Generation
Paul, Debalina Ghosh
Zhu, Hong
Bayley, Ian
arXiv,
[8] LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation
Aguilar, Gustavo
Kar, Sudipta
Solorio, Thamar
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1803 - 1813
[9] Validation and evaluation of the ADVANTG code on the ICSBEP skyshine benchmark experiment
Kotnik, Domen
Kos, Bor
Cufar, Aljaz
Mosher, Scott W.
Grove, Robert E.
Snoj, Luka
ANNALS OF NUCLEAR ENERGY, 2019, 125 : 249 - 260
[10] TRACY and SILENE Benchmark Phase II evaluation by TRACE code
Liem, Peng Hong
Naito, Yoshitaka
PROGRESS IN NUCLEAR ENERGY, 2015, 85 : 71 - 82

← 1 2 3 4 5 →