Pruning-and-distillation: One-stage joint compression framework for CNNs via clustering

被引:4
|
作者
Niu, Tao [1 ]
Teng, Yinglei [1 ]
Jin, Lei [1 ]
Zou, Panpan [1 ]
Liu, Yiding [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Filter pruning; Clustering; Knowledge distillation; Deep neural networks; NEURAL-NETWORKS;
D O I
10.1016/j.imavis.2023.104743
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Network pruning and knowledge distillation, as two effective network compression techniques, have drawn extensive attention due to their success in reducing model complexity. However, previous works regard them as two independent methods and combine them in an isolated manner rather than joint, leading to a sub-optimal optimization. In this paper, we propose a collaborative compression scheme named Pruningand-Distillation via Clustering (PDC), which integrates pruning and distillation into an end-to-end single-stage framework that takes both advantages of them. Specifically, instead of directly deleting or zeroing out unimportant filters within each layer, we reconstruct them based on clustering, which preserves the learned features as much as possible. The guidance from the teacher is integrated into the pruning process to further improve the generalization of pruned model, which alleviates the randomness caused by reconstruction to some extent. After convergence, we can equivalently remove reconstructed filters within each cluster through the proposed channel addition operation. Benefiting from such equivalence, we no longer require the time-consuming finetuning step to regain accuracy. Extensive experiments on CIFAR-10/100 and ImageNet datasets show that our method achieves the best trade-off between performance and complexity compared with other state-of-theart algorithms. For example, for ResNet-110, we achieve a 61.5% FLOPs reduction with even 0.14% top-1 accuracy increase on CIFAR-10 and remove 55.2% FLOPs with only 0.32% accuracy drop on CIFAR-100. & COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] One-stage object detection knowledge distillation via adversarial learning
    Na Dong
    Yongqiang Zhang
    Mingli Ding
    Shibiao Xu
    Yancheng Bai
    Applied Intelligence, 2022, 52 : 4582 - 4598
  • [2] One-stage object detection knowledge distillation via adversarial learning
    Dong, Na
    Zhang, Yongqiang
    Ding, Mingli
    Xu, Shibiao
    Bai, Yancheng
    APPLIED INTELLIGENCE, 2022, 52 (04) : 4582 - 4598
  • [3] State multiplicity in one-stage reactive distillation
    Bildea, CS
    Vos, FS
    REVISTA DE CHIMIE, 2005, 56 (11): : 1106 - 1113
  • [4] Compression of Acoustic Model via Knowledge Distillation and Pruning
    Li, Chenxing
    Zhu, Lei
    Xu, Shuang
    Gao, Peng
    Xu, Bo
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2785 - 2790
  • [5] One-Stage Incomplete Multi-view Clustering via Late Fusion
    Zhang, Yi
    Liu, Xinwang
    Wang, Siwei
    Liu, Jiyuan
    Dai, Sisi
    Zhu, En
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2717 - 2725
  • [6] Balanced knowledge distillation for one-stage object detector
    Lee, Sungwook
    Lee, Seunghyun
    Song, Byung Cheol
    NEUROCOMPUTING, 2022, 500 : 394 - 404
  • [7] PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation
    Kim, Jangho
    Chang, Simyung
    Kwak, Nojun
    INTERSPEECH 2021, 2021, : 4568 - 4572
  • [8] Joint Dual Feature Distillation and Gradient Progressive Pruning for BERT compression
    Zhang, Zhou
    Lu, Yang
    Wang, Tengfei
    Wei, Xing
    Wei, Zhen
    NEURAL NETWORKS, 2024, 179
  • [9] A lightweight and efficient one-stage detection framework?
    Huang, Jianchen
    Chen, Jun
    Wang, Han
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 105
  • [10] GAN-Knowledge Distillation for One-Stage Object Detection
    Wang, Wanwei
    Hong, Wei
    Wang, Feng
    Yu, Jinke
    IEEE ACCESS, 2020, 8 : 60719 - 60727