A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions

被引:0
|
作者
Ahmed, Hameeza [1 ]
Santos, Paulo C. [2 ]
Lima, Joao P. C. [2 ]
Moura, Rafael F. [2 ]
Alves, Marco A. Z. [3 ]
Beck, Antonio C. S. [2 ]
Carro, Luigi [2 ]
机构
[1] NED Univ, Dept Comp & Informat Syst Engn, Karachi, Pakistan
[2] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil
[3] Univ Fed Parana, Dept Informat, Curitiba, Parana, Brazil
关键词
Compiler; Processing in Memory; Near-data computing; Vector instructions; SIMD; 3D-Stacked memories;
D O I
10.23919/date.2019.8714956
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although not a new technique, due to the advent of 3D-stacked technologies, the integration of large memories and logic circuitry able to compute large amount of data has revived the Processing-in-Memory (PIM) techniques. PIM is a technique to increase performance while reducing energy consumption when dealing with large amounts of data. Despite several designs of PIM are available in the literature, their effective implementation still burdens the programmer. Also, various PIM instances are required to take advantage of the internal 3D-stacked memories, which further increases the challenges faced by the programmers. In this way, this work presents the Processing-In-Memory cOmpiler (PRIMO). Our compiler is able to efficiently exploit large vector units on a PIM architecture, directly from the original code. PRIMO is able to automatically select suitable PIM operations, allowing its automatic offloading. Moreover, PRIMO concerns about several PIM instances, selecting the most suitable instance while reduces internal communication between different PIM units. The compilation results of different benchmarks depict how PRIMO is able to exploit large vectors, while achieving a near-optimal performance when compared to the ideal execution for the case study PIM. PRIMO allows a speedup of 38x for specific kernels, while on average achieves 11.8x for a set of benchmarks from PolyBench Suite.
引用
收藏
页码:564 / 569
页数:6
相关论文
共 50 条
  • [21] A Design Framework for Processing-In-Memory Accelerator
    Gao, Di
    Shen, Tianhao
    Zhuo, Cheng
    2018 ACM/IEEE INTERNATIONAL WORKSHOP ON SYSTEM LEVEL INTERCONNECT PREDICTION (SLIP), 2018,
  • [22] PIMCH: Cooperative Memory Prefetching in Processing-In-Memory Architecture
    Xui, Sheng
    Wang, Ying
    Han, Yinhe
    Li, Xiaowei
    2018 23RD ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2018, : 209 - 214
  • [23] PIMSYN: Synthesizing Processing-in-memory CNN Accelerators
    Li, Wanqian
    Sun, Xiaotian
    Wang, Xinyu
    Wang, Lei
    Han, Yinhe
    Chen, Xiaoming
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [24] A Survey of Resource Management for Processing-In-Memory and Near-Memory Processing Architectures
    Khan, Kamil
    Pasricha, Sudeep
    Kim, Ryan Gary
    JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS, 2020, 10 (04) : 1 - 31
  • [25] SPIMulator: A Spintronic Processing-in-memory Simulator for Racetracks
    Bera, Pavia
    Cahoon, Stephen
    Bhanja, Sanjukta
    Jones, Alex
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (06)
  • [26] Optimal Data Allocation for Graph Processing in Processing-in-Memory Systems
    Li, Zerun
    Chen, Xiaoming
    Han, Yinhe
    27TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2022, 2022, : 238 - 243
  • [27] PIMSim: A Flexible and Detailed Processing-in-Memory Simulator
    Xu, Sheng
    Chen, Xiaoming
    Wang, Ying
    Han, Yinhe
    Qian, Xuehai
    Li, Xiaowei
    IEEE COMPUTER ARCHITECTURE LETTERS, 2019, 18 (01) : 6 - 9
  • [28] Combinators and processing-in-memory: An unconventional basis for avoiding the memory wall
    Narayanaswamy, L
    Kogge, PM
    UNCONVENTIONAL MODELS OF COMPUTATION, 1998, : 293 - 308
  • [29] On Consistency for Bulk-Bitwise Processing-in-Memory
    Perach, Ben
    Ronen, Ronny
    Kvatinsky, Shahar
    2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 705 - 717
  • [30] Resistive GP-SIMD Processing-In-Memory
    Morad, Amir
    Yavits, Leonid
    Kvatinsky, Shahar
    Ginosar, Ran
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 12 (04)