Query-efficient model extraction for text classification model in a hard label setting

被引:0
|
作者
Peng, Hao [1 ]
Guo, Shixin [1 ]
Zhao, Dandan [1 ]
Wu, Yiming [3 ]
Han, Jianming [1 ]
Wang, Zhe [1 ]
Ji, Shouling [2 ,4 ]
Zhong, Ming [1 ]
机构
[1] Zhejiang Normal Univ, Coll Comp Sci & Technol, Jinhua 321004, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China
[3] Zhejiang Univ Technol, Inst Cyberspace Secur, Hangzhou 310027, Zhejiang, Peoples R China
[4] Georgia Inst Technol, Elect & Comp Engn, Atlanta, GA 30332 USA
基金
中国国家自然科学基金;
关键词
Model extraction; Language model stealing; Model privacy; Adversarial attack; Natural language processing; Performance Evaluation;
D O I
10.1016/j.jksuci.2023.02.019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Designing a query-efficient model extraction strategy to steal models from cloud-based platforms with black-box constraints remains a challenge, especially for language models. In a more realistic setting, a lack of information about the target model's internal parameters, gradients, training data, or even confi-dence scores prevents attackers from easily copying the target model. Selecting informative and useful examples to train a substitute model is critical to query-efficient model stealing. We propose a novel model extraction framework that fine-tunes a pretrained model based on bidirectional encoder represen-tations from transformers (BERT) while improving query efficiency by utilizing an active learning selection strategy. The active learning strategy, incorporating semantic-based diversity sampling and class-balanced uncertainty sampling, builds an informative subset from the public unannotated dataset as the input for fine-tuning. We apply our method to extract deep classifiers with identical and mis-matched architectures as the substitute model under tight and moderate query budgets. Furthermore, we evaluate the transferability of adversarial examples constructed with the help of the models extracted by our method. The results show that our method achieves higher accuracy with fewer queries than existing baselines and the resulting models exhibit a high transferability success rate of adversarial examples. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:10 / 20
页数:11
相关论文
共 50 条
  • [1] TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label Setting
    Peng, Hao
    Guo, Shixin
    Zhao, Dandan
    Zhang, Xuhong
    Han, Jianmin
    Ji, Shouling
    Yang, Xing
    Zhong, Ming
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (04) : 3901 - 3916
  • [2] Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data
    Karmakar, Pratik
    Basu, Debabrota
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Towards Query-efficient Black-box Adversarial Attack on Text Classification Models
    Yadollahi, Mohammad Mehdi
    Lashkari, Arash Habibi
    Ghorbani, Ali A.
    2021 18TH INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY AND TRUST (PST), 2021,
  • [4] Query-Efficient Model Inversion Attacks: An Information Flow View
    Xu, Yixiao
    Fang, Binxing
    Li, Mohan
    Liu, Xiaolei
    Tian, Zhihong
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 1023 - 1036
  • [5] Query-Efficient Hard-Label Black-Box Attacks Using Biased Sampling
    Liu, Sijia
    Sun, Jian
    Li, Jun
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 3872 - 3877
  • [6] An Efficient Framework by Topic Model for Multi-label Text Classification
    Sun, Wei
    Ran, Xiangying
    Luo, Xiangyang
    Wang, Chongjun
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [7] Feature Extraction of Deep Topic Model for Multi-label Text Classification
    Chen W.
    Liu X.
    Lu M.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (09): : 785 - 792
  • [8] An Effective Label Noise Model for DNN Text Classification
    Jindal, Ishan
    Pressel, Daniel
    Lester, Brian
    Nokleby, Matthew
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3246 - 3256
  • [9] A Multi-Label Text Classification Model with Enhanced Label Information
    Wang, Min
    Gao, Yan
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 329 - 334
  • [10] A Label Information Aware Model for Multi-label Text Classification
    Tian, Xiaoyu
    Qin, Yongbin
    Huang, Ruizhang
    Chen, Yanping
    NEURAL PROCESSING LETTERS, 2024, 56 (05)