FPGA Implementation of a SIMD-Based Array Processor with Torus Interconnect

被引:0
|
作者
Murakami, Yuki [1 ]
机构
[1] Univ Aizu, Grad Sch Comp Sci & Engn, Aizu Wakamatsu, Fukushima, Japan
关键词
Matrix-Matrix Multiply-Add; Convolution; Convolutional Neural Networks; Array Processor;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Matrix computations are a fundamental tool in scientific and engineering applications. Among many such applications, Convolutional Neural Networks (CNN) that can be effectively computed by matrix-matrix multiplications are being popular and an efficient implementation of CNN is highly important. In this study, we have designed an parallel processor for the matrix computations using torus interconnect topology, and we implemented Cannon's algorithm for matrix-matrix multiply-add. We have evaluated the scalability of the proposed processor on a reconfigurable FPGA platform. More precisely, the designed processor with 8 x 8 functional units with 16 bit floating-point multiply-add unit was evaluated on Cyclone IV FPGA chip, with performance of 27 GFlops. We also implemented CNN calculations on our processor. We compared the matrix based approach and our proposed method. As a result, our method is 25 times faster than the matrix based approach if the processor has 8x8 functional units, image size is 32x32 and filter size is 5 x 5.
引用
收藏
页码:244 / 247
页数:4
相关论文
共 50 条
  • [31] SIMD Array on FPGA for B/W Image Processing
    Nieto, A.
    Brea, V. M.
    Vilarino, D. L.
    2008 11TH INTERNATIONAL WORKSHOP ON CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS, 2008, : 202 - 207
  • [32] EFFICIENT DATA TRANSFER OPERATIONS FOR A SIMD PROCESSOR ARRAY SYSTEM
    Lieske, Hanno
    Kyo, Shorin
    Nomoto, Shohei
    Torii, Sunao
    Kobayashi, Yuki
    Ninomiya, Yasuyuki
    Okazaki, Shinichiro
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 1625 - 1628
  • [33] SOLVING THE GENERAL LINEAR-MODEL ON A SIMD ARRAY PROCESSOR
    KONTOGHIORGHES, EJ
    COMPUTERS AND ARTIFICIAL INTELLIGENCE, 1995, 14 (04): : 353 - 370
  • [34] Internet of Things Based Reconfigurable SIMD Processor for High-Speed End Devices in FPGA
    Saminathan, Subathradevi
    Ponniah, Ramadevi
    Muthurathinam, Kavitha
    Somasundaram, Karthikeyan
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2023, 30 (06): : 1975 - 1981
  • [35] Design and Implementation of Synthetic Aperture Radar (SAR) Field-Programmable Gate Array (FPGA)-Based Processor
    Chan, Yee Kit
    Lee, Yung Chong
    Koo, Voon Chet
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [36] A Selector-Based FFT Processor and Its FPGA Implementation
    Hirai, Yuya
    Kawamura, Kazushi
    Yanagisawa, Masao
    Togawa, Nozomu
    PROCEEDINGS INTERNATIONAL SOC DESIGN CONFERENCE 2017 (ISOCC 2017), 2017, : 88 - 89
  • [37] Variable Throughput LDPC Decoders Using SIMD-based Adaptive Quantization
    Petcu, Virgil
    Boncalo, Oana
    Amaricai, Alexandru
    Savin, Valentin
    2016 39TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2016, : 425 - 428
  • [38] Design and Implementation of a Seeker Signal Processor based on FPGA and DSP
    Xie, Min
    Jiang, Yuansong
    Huang, Jiazhi
    Wang, Chao
    2015 8th International Congress on Image and Signal Processing (CISP), 2015, : 1411 - 1416
  • [39] Implementation of a Pipeline Large-FFT Processor Based on the FPGA
    Ma, Yongkui
    Liang, Henghao
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2019, 463 : 638 - 644
  • [40] An FPGA implementation of a hog-based object detection processor
    Mizuno, Kosuke
    Terachi, Yosuke
    Takagi, Kenta
    Izumi, Shintaro
    Kawaguchi, Hiroshi
    Yoshimoto, Masahiko
    IPSJ Transactions on System LSI Design Methodology, 2013, 6 : 42 - 51