FPGA Implementation of a SIMD-Based Array Processor with Torus Interconnect

被引:0
|
作者
Murakami, Yuki [1 ]
机构
[1] Univ Aizu, Grad Sch Comp Sci & Engn, Aizu Wakamatsu, Fukushima, Japan
关键词
Matrix-Matrix Multiply-Add; Convolution; Convolutional Neural Networks; Array Processor;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Matrix computations are a fundamental tool in scientific and engineering applications. Among many such applications, Convolutional Neural Networks (CNN) that can be effectively computed by matrix-matrix multiplications are being popular and an efficient implementation of CNN is highly important. In this study, we have designed an parallel processor for the matrix computations using torus interconnect topology, and we implemented Cannon's algorithm for matrix-matrix multiply-add. We have evaluated the scalability of the proposed processor on a reconfigurable FPGA platform. More precisely, the designed processor with 8 x 8 functional units with 16 bit floating-point multiply-add unit was evaluated on Cyclone IV FPGA chip, with performance of 27 GFlops. We also implemented CNN calculations on our processor. We compared the matrix based approach and our proposed method. As a result, our method is 25 times faster than the matrix based approach if the processor has 8x8 functional units, image size is 32x32 and filter size is 5 x 5.
引用
收藏
页码:244 / 247
页数:4
相关论文
共 50 条
  • [21] Mapping a VLIW x SIMD processor on an FPGA: Scalability and performance
    Nelissen, Micha
    Van Berkel, Kees
    Sawitzki, Sergei
    2007 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2, 2007, : 521 - 524
  • [22] FPGA Implementation of AES-based Crypto Processor
    Anwar, Hassan
    Daneshtalab, Masoud
    Ebrahimi, Masoumeh
    Plosila, Juha
    Tenhunen, Hannu
    2013 IEEE 20TH INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS, AND SYSTEMS (ICECS), 2013, : 369 - 372
  • [23] FPGA-based implementation of a serial RSA processor
    Mazzeo, A
    Romano, L
    Saggese, GR
    Mazzocca, N
    DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, PROCEEDINGS, 2003, : 582 - 587
  • [24] Design and implementation of an FPGA based processor for compressed images
    Balakrishnan, VS
    Pottinger, H
    Ercal, F
    Agarwal, M
    PARALLEL AND DISTRIBUTED METHODS FOR IMAGE PROCESSING IV, 2000, 4118 : 108 - 118
  • [25] Synchronizing a high-speed SIMD processor array
    Lund, S
    Bengtsson, L
    EUROMICRO SYMPOSIUM ON DIGITAL SYSTEMS DESIGN, PROCEEDINGS, 2001, : 376 - 381
  • [26] Implementation of Spaceborne SAR Imaging Processor Based On FPGA
    Xie Yizhuang
    Long Teng
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 2315 - 2318
  • [27] Implementation of LCAS based on embedded-processor and FPGA
    Liu, Fu-E
    Ge, Ning
    Zhou, Zu-Cheng
    Beijing Gongye Daxue Xuebao / Journal of Beijing University of Technology, 2007, 33 (04): : 372 - 376
  • [28] Implementation of RISC processor on FPGA
    Mane, Pravin S.
    Gupta, Indra
    Vasantha, M. K.
    2006 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS 1-6, 2006, : 1462 - +
  • [29] FPGA Implementation of Simple Processor
    Butorac, Marko
    Vucic, Mladen
    2012 19TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2012, : 137 - 140
  • [30] FPGA implementation of LMS and N-LMS processor for adaptive array applications
    Oba, Hirokazu
    Kim, Minseok
    Arai, Hiroyuki
    2006 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1 AND 2, 2006, : 448 - +