Query-efficient model extraction for text classification model in a hard label setting

被引：0

作者：

Peng, Hao ^{[1
]}

Guo, Shixin ^{[1
]}

Zhao, Dandan ^{[1
]}

Wu, Yiming ^{[3
]}

Han, Jianming ^{[1
]}

Wang, Zhe ^{[1
]}

Ji, Shouling ^{[2
,4
]}

Zhong, Ming ^{[1
]}

机构：

[1] Zhejiang Normal Univ, Coll Comp Sci & Technol, Jinhua 321004, Peoples R China

[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China

[3] Zhejiang Univ Technol, Inst Cyberspace Secur, Hangzhou 310027, Zhejiang, Peoples R China

[4] Georgia Inst Technol, Elect & Comp Engn, Atlanta, GA 30332 USA

来源：

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES | 2023年 / 35卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Model extraction; Language model stealing; Model privacy; Adversarial attack; Natural language processing; Performance Evaluation;

D O I：

10.1016/j.jksuci.2023.02.019

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Designing a query-efficient model extraction strategy to steal models from cloud-based platforms with black-box constraints remains a challenge, especially for language models. In a more realistic setting, a lack of information about the target model's internal parameters, gradients, training data, or even confi-dence scores prevents attackers from easily copying the target model. Selecting informative and useful examples to train a substitute model is critical to query-efficient model stealing. We propose a novel model extraction framework that fine-tunes a pretrained model based on bidirectional encoder represen-tations from transformers (BERT) while improving query efficiency by utilizing an active learning selection strategy. The active learning strategy, incorporating semantic-based diversity sampling and class-balanced uncertainty sampling, builds an informative subset from the public unannotated dataset as the input for fine-tuning. We apply our method to extract deep classifiers with identical and mis-matched architectures as the substitute model under tight and moderate query budgets. Furthermore, we evaluate the transferability of adversarial examples constructed with the help of the models extracted by our method. The results show that our method achieves higher accuracy with fewer queries than existing baselines and the resulting models exhibit a high transferability success rate of adversarial examples. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

引用

页码：10 / 20

页数：11

共 50 条

[1] TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label Setting
Peng, Hao
Guo, Shixin
Zhao, Dandan
Zhang, Xuhong
Han, Jianmin
Ji, Shouling
Yang, Xing
Zhong, Ming
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (04) : 3901 - 3916
[2] Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data
Karmakar, Pratik
Basu, Debabrota
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[3] Towards Query-efficient Black-box Adversarial Attack on Text Classification Models
Yadollahi, Mohammad Mehdi
Lashkari, Arash Habibi
Ghorbani, Ali A.
2021 18TH INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY AND TRUST (PST), 2021,
[4] Query-Efficient Model Inversion Attacks: An Information Flow View
Xu, Yixiao
Fang, Binxing
Li, Mohan
Liu, Xiaolei
Tian, Zhihong
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 1023 - 1036
[5] Query-Efficient Hard-Label Black-Box Attacks Using Biased Sampling
Liu, Sijia
Sun, Jian
Li, Jun
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 3872 - 3877
[6] An Efficient Framework by Topic Model for Multi-label Text Classification
Sun, Wei
Ran, Xiangying
Luo, Xiangyang
Wang, Chongjun
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[7] Feature Extraction of Deep Topic Model for Multi-label Text Classification
Chen W.
Liu X.
Lu M.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (09): : 785 - 792
[8] An Effective Label Noise Model for DNN Text Classification
Jindal, Ishan
Pressel, Daniel
Lester, Brian
Nokleby, Matthew
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3246 - 3256
[9] A Multi-Label Text Classification Model with Enhanced Label Information
Wang, Min
Gao, Yan
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 329 - 334
[10] A Label Information Aware Model for Multi-label Text Classification
Tian, Xiaoyu
Qin, Yongbin
Huang, Ruizhang
Chen, Yanping
NEURAL PROCESSING LETTERS, 2024, 56 (05)

← 1 2 3 4 5 →