FPGA Implementation of a SIMD-Based Array Processor with Torus Interconnect

被引:0
|
作者
Murakami, Yuki [1 ]
机构
[1] Univ Aizu, Grad Sch Comp Sci & Engn, Aizu Wakamatsu, Fukushima, Japan
关键词
Matrix-Matrix Multiply-Add; Convolution; Convolutional Neural Networks; Array Processor;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Matrix computations are a fundamental tool in scientific and engineering applications. Among many such applications, Convolutional Neural Networks (CNN) that can be effectively computed by matrix-matrix multiplications are being popular and an efficient implementation of CNN is highly important. In this study, we have designed an parallel processor for the matrix computations using torus interconnect topology, and we implemented Cannon's algorithm for matrix-matrix multiply-add. We have evaluated the scalability of the proposed processor on a reconfigurable FPGA platform. More precisely, the designed processor with 8 x 8 functional units with 16 bit floating-point multiply-add unit was evaluated on Cyclone IV FPGA chip, with performance of 27 GFlops. We also implemented CNN calculations on our processor. We compared the matrix based approach and our proposed method. As a result, our method is 25 times faster than the matrix based approach if the processor has 8x8 functional units, image size is 32x32 and filter size is 5 x 5.
引用
收藏
页码:244 / 247
页数:4
相关论文
共 50 条
  • [1] A Compact FPGA Implementation of a Bit-Serial SIMD Cellular Processor Array
    Walsh, Declan
    Dudek, Piotr
    2012 13TH INTERNATIONAL WORKSHOP ON CELLULAR NANOSCALE NETWORKS AND THEIR APPLICATIONS (CNNA), 2012,
  • [2] Performance Comparison of SIMD-based HEVC Decoders on Mobile Processor
    Nguyen Van Dien
    Ryu, Eun-Seok
    2017 PROCEEDINGS OF KICS-IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATIONS WITH SAMSUNG LTE & 5G SPECIAL WORKSHOP, 2017, : 298 - 303
  • [3] FPGA-based SIMD processor
    Li, SYC
    Cheuk, GCK
    Lee, KH
    Leong, PHW
    FCCM 2003: 11TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2003, : 267 - 268
  • [4] An FPGA Implementation of 3D Numerical Simulations on a 2D SIMD Array Processor
    Ishigaki, Yutaro
    Tomioka, Yoichi
    Shibata, Tsugumichi
    Kitazawa, Hitoshi
    2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2015, : 938 - 941
  • [5] New Scalable SIMD-Based Ray Caster Implementation for Virtual Machining
    Leutgeb, Alexander
    Welsch, Torsten
    Hava, Michael
    PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2013), PT I, 2014, 8384 : 317 - 326
  • [6] An FPGA based SIMD processor with a vector memory unit
    Cho, Junho
    Chang, Hoseok
    Sung, Wonyong
    2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 525 - +
  • [7] DESIGN AND IMPLEMENTATION OF A RING ARRAY OPTICAL INTERCONNECT FOR SIMD-MACHINES
    WANG, JM
    KANTERAKIS, E
    KATZ, A
    ZHANG, Y
    LI, Y
    OPTICAL COMPUTING, 1995, 139 : 173 - 176
  • [8] SIMD-Based Soft Error Detection
    Chen, Zhi
    Nicolau, Alexandru
    Veidenbaum, Alexander V.
    PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 45 - 54
  • [9] SIMD-Matcher: A SIMD-based Arbitrary Matching Framework
    Wang, Ping
    Wen, Fei
    Gratz, Paul, V
    Sprintson, Alex
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 19 (03)
  • [10] SIMD PROCESSOR BASED IMPLEMENTATION OF RECURSIVE FILTERING EQUATIONS
    Ahn, Jaewoo
    Chang, Hoseok
    Cho, Junho
    Sung, Wonyong
    SIPS: 2009 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS, 2009, : 87 - 92