Circular Reconfigurable Parallel Processor for Edge Computing

被引:0
|
作者
Li, Yuan [1 ]
Zhu, Jianbin [1 ]
Fu, Yao [1 ]
Lei, Yu [1 ]
Nagata, Toshio [1 ]
Braidwood, Ryan [1 ]
Fu, Haohuan [2 ]
Zheng, Juepeng [3 ]
Luk, Wayne [4 ]
Fan, Hongxiang [4 ,5 ]
机构
[1] AzurEngine Technol, Zhuhai, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
[3] Sun Yat Sen Univ, Guangzhou, Peoples R China
[4] Imperial Coll London, London, England
[5] Univ Cambridge, Cambridge, England
关键词
ARCHITECTURE; COMPUTATION; DEVICES;
D O I
10.1109/ISCA59077.2024.00067
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graphics Processing Units (GPUs) have emerged as the predominant hardware platforms for massively parallel computing. However, their inherent von-Neumann architecture still suffers performance inefficiency stemming from the sequential instruction execution and frequent data transfer overheads within the memory system. These intrinsic architectural flaws lead to heavy overhead on the latency, area, and energy efficiency, rendering GPUs suboptimal for edge computing applications. To tackle these challenges, this paper introduces a novel circular Reconfigurable Parallel Processor (RPP) to enable massively parallel applications in edge computing with high efficiency. RPP features a novel circular array of reconfigurable compute engines, enabling efficient streaming dataflow processing. In contrast to traditional Coarse Grained Reconfigurable Architecture (CGRA), the circular network topology of RPP is formed by linear switch networks with an innovative gasket memory, which reduces complicated network routing overheads while allowing versatile datapath mapping and optimized data reuse. A dedicated hierarchical memory system is proposed to support different memory access patterns and address mapping strategies, enabling flexible data access with high memory efficiency. Several hardware optimizations are further introduced to improve hardware utilization and performance such as concurrent kernel execution, register split&refill and heterogeneous scalar&vector computing. To fully utilize the hardware capability of RPP, we develop an end-to-end software stack consisting of a compiler, runtime environment, and different RPP libraries. This software stack is designed to be compatible with the GPGPU computing paradigm, enhancing its potential for broader adoption. Fabricated in a 14nm process, RPP occupies an area of 119 mm2 and operates at a maximum power of 15W with a 1GHz clock frequency. From the runtime measurement of various workloads, RPP achieves up to 27.5x higher energy efficiency than Nvidia edge GPUs in deep learning inference and up to 14062x lower latency than AMD Ryzen 5 CPU in linear algebra operations.
引用
收藏
页码:863 / 875
页数:13
相关论文
共 50 条
  • [1] A Research and Design of Reconfigurable CNN Co-Processor for Edge Computing
    Li W.
    Chen Y.
    Chen T.
    Nan L.
    Du Y.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (04): : 1499 - 1512
  • [2] A RECONFIGURABLE PARALLEL PROCESSOR WITH MICROPROGRAM CONTROL
    OKADA, Y
    TAJIMA, H
    MORI, R
    IEEE MICRO, 1982, 2 (04) : 48 - 60
  • [3] A RECONFIGURABLE FULLY PARALLEL ASSOCIATIVE PROCESSOR
    SCHERSON, ID
    ILGEN, S
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1989, 6 (01) : 69 - 89
  • [4] Reconfigurable parallel processor for noise suppression
    Cuviello, M
    Dang, PP
    Chau, PM
    MEDICAL IMAGING 1999: IMAGE PERCEPTION AND PERFORMANCE, 1999, 3663 : 333 - 341
  • [5] A reconfigurable parallel architecture for a fuzzy processor
    Ascia, G
    Catania, V
    Puliafito, A
    Vita, L
    INFORMATION SCIENCES, 1996, 88 (1-4) : 299 - 315
  • [6] Design and implementation of the MorphoSys reconfigurable computing processor
    Lee, MH
    Singh, H
    Lu, GM
    Bagherzadeh, N
    Kurdahi, FJ
    Eliseu, MC
    Alves, VC
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2000, 24 (2-3): : 147 - 164
  • [7] Design and Implementation of the MorphoSys Reconfigurable Computing Processor
    Ming-Hau Lee
    Hartej Singh
    Guangming Lu
    Nader Bagherzadeh
    Fadi J. Kurdahi
    Eliseu M.C. Filho
    Vladimir Castro Alves
    Journal of VLSI signal processing systems for signal, image and video technology, 2000, 24 : 147 - 164
  • [8] Design and implementation of the MorphoSys reconfigurable computing processor
    Lee, Ming-Hau
    Singh, Hartej
    Lu, Guangming
    Bagherzadeh, Nader
    Kurdahi, Fadi J.
    Filho, Eliseu M.C.
    Alves, Vladimir Castro
    Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 2000, 24 (02): : 147 - 164
  • [9] Introduction to the Future of Reconfigurable Computing and Processor Architectures
    Carro, Luigi
    Wong, Stephan
    EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, PROCEEDINGS, 2009, 5657 : 226 - 226
  • [10] Reconfigurable parallel inner product processor architectures
    Lin, R
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2001, 9 (02) : 261 - 272