An OpenCL-Based FPGA Accelerator for Faster R-CNN

被引:5
|
作者
An, Jianjing [1 ,2 ]
Zhang, Dezheng [1 ,2 ]
Xu, Ke [1 ,2 ]
Wang, Dong [1 ,2 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China
基金
北京市自然科学基金;
关键词
convolutional neural network; Faster R-CNN; FPGA; hardware accelerator;
D O I
10.3390/e24101346
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still few accelerator designs for faster regions with CNN features (Faster R-CNN) algorithms. Moreover, CNN's inherently high computational complexity and high memory complexity bring challenges to the design of efficient accelerators. This paper proposes a software-hardware co-design scheme based on OpenCL to implement a Faster R-CNN object detection algorithm on FPGA. First, we design an efficient, deep pipelined FPGA hardware accelerator that can implement Faster R-CNN algorithms for different backbone networks. Then, an optimized hardware-aware software algorithm was proposed, including fixed-point quantization, layer fusion, and a multi-batch Regions of interest (RoIs) detector. Finally, we present an end-to-end design space exploration scheme to comprehensively evaluate the performance and resource utilization of the proposed accelerator. Experimental results show that the proposed design achieves a peak throughput of 846.9 GOP/s at the working frequency of 172 MHz. Compared with the state-of-the-art Faster R-CNN accelerator and the one-stage YOLO accelerator, our method achieves 10 x and 2.1 x inference throughput improvements, respectively.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Lithology Identification Based on Improved Faster R-CNN
    Fu, Peng
    Wang, Jiyang
    MINERALS, 2024, 14 (09)
  • [32] Faster R-CNN Based Microscopic Cell Detection
    Yang, Su
    Fang, Bin
    Tang, Wei
    Wu, Xuegang
    Qian, Jiye
    Yang, Weibin
    2017 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC), 2017, : 345 - 350
  • [33] Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs
    Tapiador-Morales, Ricardo
    Rios-Navarro, Antonio
    Linares-Barranco, Alejandro
    Kim, Minkyu
    Kadetotad, Deepak
    Seo, Jae-sun
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2017, PT II, 2017, 10306 : 271 - 282
  • [34] Pedestrian detection method based on Faster R-CNN
    Zhang, Hui
    Du, Yu
    Ning, Shurong
    Zhang, Yonghua
    Yang, Shuo
    Du, Chen
    2017 13TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2017, : 427 - 430
  • [35] Insulator Defect Recognition Based on Faster R-CNN
    Wang, Yifan
    Li, Zhongxu
    Yang, Xuecheng
    Luo, Ning
    Zhao, Yu
    Zhou, Gang
    PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (CITS), 2020, : 103 - 106
  • [36] Local keypoint-based Faster R-CNN
    Xintao Ding
    Qingde Li
    Yongqiang Cheng
    Jinbao Wang
    Weixin Bian
    Biao Jie
    Applied Intelligence, 2020, 50 : 3007 - 3022
  • [37] A Supernova Detection Implementation based on Faster R-CNN
    Wu, Tianyuan
    2020 INTERNATIONAL CONFERENCE ON BIG DATA & ARTIFICIAL INTELLIGENCE & SOFTWARE ENGINEERING (ICBASE 2020), 2020, : 390 - 393
  • [38] Local keypoint-based Faster R-CNN
    Ding, Xintao
    Li, Qingde
    Cheng, Yongqiang
    Wang, Jinbao
    Bian, Weixin
    Jie, Biao
    APPLIED INTELLIGENCE, 2020, 50 (10) : 3007 - 3022
  • [39] In-FPGA Instrumentation Framework for OpenCL-Based Designs
    Bensalem, Hachem
    Blaquiere, Yves
    Savaria, Yvon
    IEEE ACCESS, 2020, 8 (08): : 212979 - 212994
  • [40] Evaluation of an OpenCL-Based FPGA Platform for Particle Filter
    Tatsumi, Shunsuke
    Hariyama, Masanori
    Ikoma, Norikazu
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2016, 20 (05) : 743 - 754