Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge

被引:0
|
作者
Lu, Yao [1 ]
Rodriguez, Hiram Rayo Torres [1 ]
Vogel, Sebastian [2 ]
van de Waterlaat, Nick [1 ]
Jancura, Pavol [3 ]
机构
[1] NXP Semicond, Eindhoven, Netherlands
[2] NXP Semiconductors, Munich, Germany
[3] Eindhoven Univ Technol, Eindhoven, Netherlands
关键词
D O I
10.1145/3615338.3618122
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantized models. However, existing QA-NAS approaches, particularly few-bit mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently, QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.
引用
收藏
页码:1 / 5
页数:5
相关论文
共 50 条
  • [41] Efficient Constraint-Aware Neural Architecture Search for Object Detection
    Poliakov, Egor
    Hung, Wei-Jie
    Huang, Ching-Chun
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 733 - 740
  • [42] HGNAS: Hardware-Aware Graph Neural Architecture Search for Edge Devices
    Zhou, Ao
    Yang, Jianlei
    Qi, Yingjie
    Qiao, Tong
    Shi, Yumeng
    Duan, Cenlin
    Zhao, Weisheng
    Hu, Chunming
    IEEE Transactions on Computers, 2024, 73 (12) : 2693 - 2707
  • [43] Search-Time Efficient Device Constraints-Aware Neural Architecture Search
    Dutta, Oshin
    Kanvar, Tanu
    Agarwal, Sumeet
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2023, 2023, 14301 : 38 - 48
  • [44] QPA: A Quantization-Aware Piecewise Polynomial Approximation Methodology for Hardware-Efficient Implementations
    Geng, Haoran
    Chen, Xiaoliang
    Zhao, Ning
    Du, Yuan
    Du, Li
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (07) : 931 - 944
  • [45] Poster: Scaling Up Deep Neural Network Optimization for Edge Inference
    Lu, Bingqian
    Yang, Jianyi
    Ren, Shaolei
    2020 IEEE/ACM SYMPOSIUM ON EDGE COMPUTING (SEC 2020), 2020, : 170 - 172
  • [46] Neural Architecture Search for Efficient Uncalibrated Deep Photometric Stereo
    Sarno, Francesco
    Kumar, Suryansh
    Kaya, Berk
    Huang, Zhiwu
    Ferrari, Vittorio
    Van Gool, Luc
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2304 - 2314
  • [47] Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications
    Reddy, K. Manikantta
    Vasantha, M. H.
    Kumar, Y. B. Nithin
    Gopal, Ch. Keshava
    Dwivedi, Devesh
    INTEGRATION-THE VLSI JOURNAL, 2021, 81 : 268 - 279
  • [48] DPNAS: Neural Architecture Search for Deep Learning with Differential Privacy
    Cheng, Anda
    Wang, Jiaxing
    Zhang, Xi Sheryl
    Chen, Qiang
    Wang, Peisong
    Cheng, Jian
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6358 - 6366
  • [49] Efficient Self-learning Evolutionary Neural Architecture Search
    Qiu, Zhengzhong
    Bi, Wei
    Xu, Dong
    Guo, Hua
    Ge, Hongwei
    Liang, Yanchun
    Lee, Heow Pueh
    Wu, Chunguo
    APPLIED SOFT COMPUTING, 2023, 146
  • [50] Peaches: Personalized Federated Learning With Neural Architecture Search in Edge Computing
    Yan, Jiaming
    Liu, Jianchun
    Xu, Hongli
    Wang, Zhiyuan
    Qiao, Chunming
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (11) : 10296 - 10312