FPGA Implementation of a SIMD-Based Array Processor with Torus Interconnect

被引:0
|
作者
Murakami, Yuki [1 ]
机构
[1] Univ Aizu, Grad Sch Comp Sci & Engn, Aizu Wakamatsu, Fukushima, Japan
关键词
Matrix-Matrix Multiply-Add; Convolution; Convolutional Neural Networks; Array Processor;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Matrix computations are a fundamental tool in scientific and engineering applications. Among many such applications, Convolutional Neural Networks (CNN) that can be effectively computed by matrix-matrix multiplications are being popular and an efficient implementation of CNN is highly important. In this study, we have designed an parallel processor for the matrix computations using torus interconnect topology, and we implemented Cannon's algorithm for matrix-matrix multiply-add. We have evaluated the scalability of the proposed processor on a reconfigurable FPGA platform. More precisely, the designed processor with 8 x 8 functional units with 16 bit floating-point multiply-add unit was evaluated on Cyclone IV FPGA chip, with performance of 27 GFlops. We also implemented CNN calculations on our processor. We compared the matrix based approach and our proposed method. As a result, our method is 25 times faster than the matrix based approach if the processor has 8x8 functional units, image size is 32x32 and filter size is 5 x 5.
引用
收藏
页码:244 / 247
页数:4
相关论文
共 50 条
  • [41] Efficient implementation for high accuracy DCT processor based on FPGA
    Naviner, L
    Danger, JL
    Laurent, C
    Garcia-Garcia, A
    42ND MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, PROCEEDINGS, VOLS 1 AND 2, 1999, : 508 - 511
  • [42] SIMD-Based Multiple Sets Intersection with Dual-Scale Search Algorithm
    Song, Xingshen
    Yang, Yuexiang
    Li, Xiaoyong
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2311 - 2314
  • [43] Design and Implementation of FPGA Based Processor for Wireless Sensor Nodes
    Pragati, Sugantha R.
    Jawahar, A.
    2017 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2017, : 1912 - 1916
  • [44] A two-way SIMD-based reconfigurable computing architecture for multimedia applications
    Lai, YK
    Chen, LF
    Chen, JC
    Chiu, CW
    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 4578 - 4581
  • [45] A Flexible FPGA-to-FPGA Interconnect Interface Design and Implementation
    Wu, An
    Jin, Xi
    Guo, ShuaiZhi
    Du, XueLiang
    2015 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, AND SYSTEMS (ICCCS), 2015, : 52 - 56
  • [46] SIMD-based low bit-depth motion estimation with application to HEVC
    Ramazan Duvar
    Ayhan Küçükmanisa
    Orhan Akbulut
    Aysun Taşyapı Çelebi
    Oğuzhan Urhan
    Signal, Image and Video Processing, 2023, 17 : 1449 - 1456
  • [47] Implementation of a SliM Array Processor
    Chang, HM
    Sunwoo, MH
    Cho, TH
    10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM - PROCEEDINGS OF IPPS '96, 1996, : 771 - 775
  • [48] SYSTOLIC ARRAY PROCESSOR IMPLEMENTATION
    SYMANSKI, JJ
    PROCEEDINGS OF THE SOCIETY OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS, 1981, 298 : 27 - 32
  • [49] FPGA Implementation of Image Segmentation Processor
    Shanthi, K. J.
    Ashok, L. R.
    Anandu, A. S.
    Das, Gokul B.
    2009 SECOND INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY (ICETET 2009), 2009, : 154 - 157
  • [50] FPGA Implementation of OFDM Baseband Processor
    Sung, Kuohua
    Hsu, Terng-Yin
    2017 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING, 2017, : 466 - 467