A Low Memory Requirement MobileNets Accelerator Based on FPGA for Auxiliary Medical Tasks

被引:3
|
作者
Lin, Yanru [1 ]
Zhang, Yanjun [2 ]
Yang, Xu [3 ]
机构
[1] Beijing Inst Technol, Sch Integrated Circuits & Elect, 5 South St, Beijing 100081, Peoples R China
[2] Beijing Inst Technol, Sch Cyberspace Sci & Technol, 5 South St, Beijing 100081, Peoples R China
[3] Beijing Inst Technol, Sch Comp Sci & Technol, 5 South St, Beijing 100081, Peoples R China
来源
BIOENGINEERING-BASEL | 2023年 / 10卷 / 01期
关键词
convolutional neural network; FPGA; hardware accelerator; MobileNetV2; auxiliary medical tasks;
D O I
10.3390/bioengineering10010028
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Convolutional neural networks (CNNs) have been widely applied in the fields of medical tasks because they can achieve high accuracy in many fields using a large number of parameters and operations. However, many applications designed for auxiliary checks or help need to be deployed into portable devices, where the huge number of operations and parameters of a standard CNN can become an obstruction. MobileNet adopts a depthwise separable convolution to replace the standard convolution, which can greatly reduce the number of operations and parameters while maintaining a relatively high accuracy. Such highly structured models are very suitable for FPGA implementation in order to further reduce resource requirements and improve efficiency. Many other implementations focus on performance more than on resource requirements because MobileNets has already reduced both parameters and operations and obtained significant results. However, because many small devices only have limited resources they cannot run MobileNet-like efficient networks in a normal way, and there are still many auxiliary medical applications that require a high-performance network running in real-time to meet the requirements. Hence, we need to figure out a specific accelerator structure to further reduce the memory and other resource requirements while running MobileNet-like efficient networks. In this paper, a MobileNet accelerator is proposed to minimize the on-chip memory capacity and the amount of data that is transferred between on-chip and off-chip memory. We propose two configurable computing modules: Pointwise Convolution Accelerator and Depthwise Convolution Accelerator, to parallelize the network and reduce the memory requirement with a specific dataflow model. At the same time, a new cache usage method is also proposed to further reduce the use of the on-chip memory. We implemented the accelerator on Xilinx XC7Z020, deployed MobileNetV2 on it, and achieved 70.94 FPS with 524.25 KB on-chip memory usage under 150 MHz.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition
    Toshniwal, Shubham
    Tang, Hao
    Lu, Liang
    Livescu, Karen
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3532 - 3536
  • [22] Low Power FPGA-SoC Design Techniques for CNN-based Object Detection Accelerator
    Kim, Heekyung
    Choi, Ken
    2019 IEEE 10TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2019, : 1130 - 1134
  • [23] Low-Memory Requirement and Efficient Face Recognition System Based on DCT Pyramid
    Atta, Randa
    Ghanbari, Mohammad
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2010, 56 (03) : 1542 - 1548
  • [24] VLIW-Based FPGA Computation Fabric with Streaming Memory Hierarchy for Medical Imaging Applications
    Hoozemans, Joost
    Heij, Rolf
    van Straten, Jeroen
    Al-Ars, Zaid
    APPLIED RECONFIGURABLE COMPUTING, 2017, 10216 : 36 - 43
  • [25] Speedy FPGA-Based Packet Classifiers with Low On-Chip Memory Requirements
    Chou, Chih-Hsun
    Pong, Fong
    Tzeng, Nian-Feng
    FPGA 12: PROCEEDINGS OF THE 2012 ACM-SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, 2012, : 11 - 20
  • [26] Low-Power Lane Detection Unit With Sliding-Based Parallel Segment Detection Accelerator for FPGA
    Yun, Heuijee
    Park, Daejin
    IEEE ACCESS, 2024, 12 : 4339 - 4353
  • [27] FPGA Prototyping of Systolic Array-based Accelerator for Low-Precision Inference of Deep Neural Networks
    Kim, Soobeom
    Cho, Seunghwan
    Park, Eunhyeok
    Yoo, Sungjoo
    PROCEEDINGS OF THE 2021 32ND INTERNATIONAL WORKSHOP ON RAPID SYSTEM PROTOTYPING (RSP): SHORTENING THE PATH FROM SPECIFICATION TO PROTOTYPE, 2021, : 1 - 7
  • [28] A Heterogeneous FPGA-based Accelerator Design for Efficient and Low-cost Point Clouds Deep Learning Inference
    Xu, Jinling
    Wang, Yonggui
    Zhouy, Wenbiao
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2725 - 2729
  • [29] FPGA-based low-complexity high-throughput real-time hardware accelerator for robust watermarking
    Ge, Hangqi
    Sha, Jin
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2019, 16 (04) : 813 - 820
  • [30] Prototype of Low Complexity CNN Hardware Accelerator with FPGA-based PYNQ Platform for Dual-Mode Biometrics Recognition
    Chen, Yu-Hsiang
    Fan, Chih-Peng
    Chang, Robert Chen-Hao
    2020 17TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC 2020), 2020, : 189 - 190