Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks

被引:147
|
作者
Zhan, Chen [1 ,2 ,3 ]
Fang, Zhenman [2 ]
Zhou, Peipei [2 ]
Pan, Peichen [3 ]
Cong, Jason [1 ,2 ,3 ]
机构
[1] Peking Univ, Ctr Energy Efficient Comp & Applicat, Beijing, Peoples R China
[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
[3] Falcon Comp Inc, Los Angeles, CA USA
关键词
COPROCESSOR;
D O I
10.1145/2966986.2967011
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the performance and energy-efficiency of the computation-demanding CNN, the FPGA-based acceleration emerges as one of the most attractive alternatives. In this paper we design and implement Caffeine, a hardware/software co-designed library to efficiently accelerate the entire CNN on FPGAs. First, we propose a uniformed convolutional matrix-multiplication representation for both computation-intensive convolutional layers and communication-intensive fully connected (FCN) layers. Second, we design Caffeine with the goal to maximize the underlying FPGA computing and bandwidth resource utilization, with a key focus on the bandwidth optimization by the memory access reorganization not studied in prior work. Moreover, we implement Caffeine in the portable high-level synthesis and provide various hardware/software definable parameters for user configurations. Finally, we also integrate Caffeine into the industry-standard software deep learning framework Caffe. We evaluate Caffeine and its integration with Caffe by implementing VGG16 and AlexNet network on multiple FPGA platforms. Caffeine achieves a peak performance of 365 GOPS on Xilinx KU060 FPGA and 636 GOPS on Virtex7 690t FPGA. This is the best published result to our best knowledge. We achieve more than 100x speedup on FCN layers over previous FPGA accelerators. An end-to-end evaluation with Caffe integration shows up to 7.3x and 43.5x performance and energy gains over Caffe on a 12-core Xeon server, and 1.5x better energy-efficiency over the GPU implementation on a medium-sized FPGA (KU060). Performance projections to a system with a high-end FPGA (Virtex7 690t) shows even higher gains.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Convergence of deep convolutional neural networks
    Xu, Yuesheng
    Zhang, Haizhang
    NEURAL NETWORKS, 2022, 153 : 553 - 563
  • [32] Fusion of Deep Convolutional Neural Networks
    Suchy, Robert
    Ezekiel, Soundararajan
    Cornacchia, Maria
    2017 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2017,
  • [33] Representation Visualization of Convolutional Neural Networks: A Survey
    Si N.-W.
    Zhang W.-L.
    Qu D.
    Luo X.-Y.
    Chang H.-Y.
    Niu T.
    Zidonghua Xuebao/Acta Automatica Sinica, 2022, 48 (08): : 1890 - 1892
  • [34] Problems of representation of electrocardiograms in convolutional neural networks
    Sereda, Iana
    Alekseev, Sergey
    Koneva, Aleksandra
    Khorkin, Alexey
    Osipov, Grigory
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [35] Lateral Representation Learning in Convolutional Neural Networks
    Ballester, Pedro
    Correa, Ulisses Brisolara
    Araujo, Ricardo Matsumura
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [36] Evaluation of deep convolutional neural networks for in situ hybridization gene expression image representation
    Abed-Esfahani, Pegah
    Darwin, Benjamin C.
    Howard, Derek
    Wang, Nick
    Kim, Ethan
    Lerch, Jason
    French, Leon
    PLOS ONE, 2022, 17 (01):
  • [37] Implementation-Independent Representation for Deep Convolutional Neural Networks and Humans in Processing Faces
    Song, Yiying
    Qu, Yukun
    Xu, Shan
    Liu, Jia
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2021, 14
  • [38] A Survey on Efficient Convolutional Neural Networks and Hardware Acceleration
    Ghimire, Deepak
    Kil, Dayoung
    Kim, Seong-heum
    ELECTRONICS, 2022, 11 (06)
  • [39] A Fourier domain acceleration framework for convolutional neural networks
    Lin, Jinhua
    Ma, Lin
    Yao, Yu
    NEUROCOMPUTING, 2019, 364 : 254 - 268
  • [40] Acceleration and implementation of convolutional neural networks based on FPGA
    Zhao, Sijie
    Gao, Shangshang
    Wang, Rugang
    Wang, Yuanyuan
    Zhou, Feng
    Guo, Naihong
    DIGITAL SIGNAL PROCESSING, 2023, 141