Pruning-and-distillation: One-stage joint compression framework for CNNs via clustering

被引：4

作者：

Niu, Tao ^{[1
]}

Teng, Yinglei ^{[1
]}

Jin, Lei ^{[1
]}

Zou, Panpan ^{[1
]}

Liu, Yiding ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2023年 / 136卷

基金：

中国国家自然科学基金;

关键词：

Filter pruning; Clustering; Knowledge distillation; Deep neural networks; NEURAL-NETWORKS;

D O I：

10.1016/j.imavis.2023.104743

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Network pruning and knowledge distillation, as two effective network compression techniques, have drawn extensive attention due to their success in reducing model complexity. However, previous works regard them as two independent methods and combine them in an isolated manner rather than joint, leading to a sub-optimal optimization. In this paper, we propose a collaborative compression scheme named Pruningand-Distillation via Clustering (PDC), which integrates pruning and distillation into an end-to-end single-stage framework that takes both advantages of them. Specifically, instead of directly deleting or zeroing out unimportant filters within each layer, we reconstruct them based on clustering, which preserves the learned features as much as possible. The guidance from the teacher is integrated into the pruning process to further improve the generalization of pruned model, which alleviates the randomness caused by reconstruction to some extent. After convergence, we can equivalently remove reconstructed filters within each cluster through the proposed channel addition operation. Benefiting from such equivalence, we no longer require the time-consuming finetuning step to regain accuracy. Extensive experiments on CIFAR-10/100 and ImageNet datasets show that our method achieves the best trade-off between performance and complexity compared with other state-of-theart algorithms. For example, for ResNet-110, we achieve a 61.5% FLOPs reduction with even 0.14% top-1 accuracy increase on CIFAR-10 and remove 55.2% FLOPs with only 0.32% accuracy drop on CIFAR-100. & COPY; 2023 Elsevier B.V. All rights reserved.

引用

页数：11

共 50 条

[1] One-stage object detection knowledge distillation via adversarial learning
Na Dong
Yongqiang Zhang
Mingli Ding
Shibiao Xu
Yancheng Bai
Applied Intelligence, 2022, 52 : 4582 - 4598
[2] One-stage object detection knowledge distillation via adversarial learning
Dong, Na
Zhang, Yongqiang
Ding, Mingli
Xu, Shibiao
Bai, Yancheng
APPLIED INTELLIGENCE, 2022, 52 (04) : 4582 - 4598
[3] State multiplicity in one-stage reactive distillation
Bildea, CS
Vos, FS
REVISTA DE CHIMIE, 2005, 56 (11): : 1106 - 1113
[4] Compression of Acoustic Model via Knowledge Distillation and Pruning
Li, Chenxing
Zhu, Lei
Xu, Shuang
Gao, Peng
Xu, Bo
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2785 - 2790
[5] One-Stage Incomplete Multi-view Clustering via Late Fusion
Zhang, Yi
Liu, Xinwang
Wang, Siwei
Liu, Jiyuan
Dai, Sisi
Zhu, En
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2717 - 2725
[6] Balanced knowledge distillation for one-stage object detector
Lee, Sungwook
Lee, Seunghyun
Song, Byung Cheol
NEUROCOMPUTING, 2022, 500 : 394 - 404
[7] PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation
Kim, Jangho
Chang, Simyung
Kwak, Nojun
INTERSPEECH 2021, 2021, : 4568 - 4572
[8] Joint Dual Feature Distillation and Gradient Progressive Pruning for BERT compression
Zhang, Zhou
Lu, Yang
Wang, Tengfei
Wei, Xing
Wei, Zhen
NEURAL NETWORKS, 2024, 179
[9] A lightweight and efficient one-stage detection framework?
Huang, Jianchen
Chen, Jun
Wang, Han
COMPUTERS & ELECTRICAL ENGINEERING, 2023, 105
[10] GAN-Knowledge Distillation for One-Stage Object Detection
Wang, Wanwei
Hong, Wei
Wang, Feng
Yu, Jinke
IEEE ACCESS, 2020, 8 : 60719 - 60727

← 1 2 3 4 5 →