CodeWMBench: An Automated Benchmark for Code Watermarking Evaluation

被引:0
|
作者
Wu, Benlong [1 ]
Chen, Kejiang [1 ]
He, Yanru [1 ]
Chen, Guoqiang [1 ]
Zhang, Weiming [1 ]
Yu, Nenghai [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
关键词
Programming language model; code watermark; benchmark; SOFTWARE WATERMARKING;
D O I
10.1145/3674399.3674447
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As deep learning progresses, programming language generation models such as CodeLlama, GitHub Copilot, and ChatGPT have been widely applied to intelligent code development. However, this also reduces the cost of code plagiarism, posing challenges to copyright and academic integrity. In response to the specific needs for human-machine code detection, this paper introduces a comprehensive automated benchmark CodeWMBench for active detection of human-machine code through watermarking. With a meticulous evaluation of eight code watermarking methods, we demonstrated their performance in terms of harmlessness, robustness, and transparency. Specifically, for the first time, we introduced watermark removal techniques based on large language models and conducted the first assessment of these watermarking methods against code rewriting and retranslating attacks. In the discussion, we delved into the critical issues currently facing code watermarking, including why existing code watermarking methods struggle to resist removal by large language models and potential future methods that could withstand such removals.
引用
收藏
页码:120 / 125
页数:6
相关论文
共 50 条
  • [1] A public automated web-based evaluation service for watermarking schemes:: StirMark Benchmark
    Petitcolas, FAP
    Steinebach, M
    Raynal, F
    Dittmann, J
    Fontaine, C
    Fatès, N
    SECURITY AND WATERMARKING OF MULTIMEDIA CONTENTS III, 2001, 4314 : 575 - 584
  • [2] DEVELOPMENT AND EVALUATION OF A BENCHMARK TOOL FOR DIGITAL WATERMARKING
    Kamiya, Kosuke
    Naoe, Kumi
    Mori, Takuma
    Iwamura, Keiichi
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (3B): : 1305 - 1312
  • [3] On the Evaluation of Neural Code Translation: Taxonomy and Benchmark
    Jiao, Mingsheng
    Yu, Tingrui
    Li, Xuan
    Qiu, Guanjie
    Gu, Xiaodong
    Shen, Beijun
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1529 - 1541
  • [4] GLUECoS : An Evaluation Benchmark for Code-Switched NLP
    Khanuja, Simran
    Dandapat, Sandipan
    Srinivasan, Anirudh
    Sitaram, Sunayana
    Choudhury, Monojit
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3575 - 3585
  • [5] A benchmark for medical image watermarking
    Navas, K. A.
    Sasikumar, M.
    Sreevidya, S.
    2007 14TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNALS, & IMAGE PROCESSING & EURASIP CONFERENCE FOCUSED ON SPEECH & IMAGE PROCESSING, MULTIMEDIA COMMUNICATIONS & SERVICES, 2007, : 439 - 442
  • [6] ScenEval: A Benchmark for Scenario-Based Evaluation of Code Generation
    Paul, Debalina Ghosh
    Zhu, Hong
    Bayley, Ian
    2024 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST, 2024, : 55 - 63
  • [7] ScenEval: A Benchmark for Scenario-Based Evaluation of Code Generation
    Paul, Debalina Ghosh
    Zhu, Hong
    Bayley, Ian
    arXiv,
  • [8] LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation
    Aguilar, Gustavo
    Kar, Sudipta
    Solorio, Thamar
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1803 - 1813
  • [9] Validation and evaluation of the ADVANTG code on the ICSBEP skyshine benchmark experiment
    Kotnik, Domen
    Kos, Bor
    Cufar, Aljaz
    Mosher, Scott W.
    Grove, Robert E.
    Snoj, Luka
    ANNALS OF NUCLEAR ENERGY, 2019, 125 : 249 - 260
  • [10] TRACY and SILENE Benchmark Phase II evaluation by TRACE code
    Liem, Peng Hong
    Naito, Yoshitaka
    PROGRESS IN NUCLEAR ENERGY, 2015, 85 : 71 - 82