PIMPR: PIM-based Personalized Recommendation with Heterogeneous Memory Hierarchy

被引:0
|
作者
Yang, Tao [1 ,2 ]
Ma, Hui [1 ]
Zhao, Yilong [1 ]
Liu, Fangxin [1 ]
He, Zhezhi [1 ]
Sun, Xiaoli [4 ]
Jiang, Li [1 ,2 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Shanghai Qi Zhi Inst, Shanghai, Peoples R China
[3] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
[4] Zhejiang Inst Sci & Technol Informat, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Recommendation System; PIM; Embedding; Acceleration; Architecture Design;
D O I
10.23919/DATE56975.2023.10137249
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning-based personalized recommendation models (DLRMs) are dominating AI tasks in data centers. The performance bottleneck of typical DLRMs mainly lies in the memory-bounded embedding layers. Resistive Random Access Memory (ReRAM)-based Processing-in-memory (PIM) architecture is a natural fit for DLRMs thanks to its in-situ computation and high computational density. However, it remains two challenges before DLRMs fully embrace ReRAM-based PIM architectures: 1) The size of DLRM's embedding tables can reach tens of GBs, far beyond the memory capacity of typical ReRAM chips. 2) The irregular sparsity conveyed in the embedding layers is difficult to exploit in ReRAM crossbars architecture. In this paper, we present a PIM-based DLRM accelerator named PIMPR. PIMPR has a heterogeneous memory hierarchy-ReRAM crossbar-based PIM modules serve as the computing caches with high computing parallelism, while DIMM modules are able to hold the entire embedding table-leveraging the data locality of DLRM's embedding layers. Moreover, we propose a runtime strategy to skip the useless calculation induced by the sparsity and an offline strategy to balance the workload of each ReRAM crossbar. Compared to the state-of-the-art DLRM accelerator SPACE and TRiM, PIMPR achieves on average 2.02x and 1.79x speedup, 5.6x, and 5.1x energy reduction, respectively.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Introspection in a massively parallel PIM-based architecture
    Zima, HP
    PARALLEL COMPUTING: SOFTWARE TECHNOLOGY, ALGORITHMS, ARCHITECTURES AND APPLICATIONS, 2004, 13 : 441 - 448
  • [2] MTTF-Aware Reliability Task Scheduling for PIM-based Heterogeneous Computing System
    Pang, Desong
    Xu, Dawen
    Wang, Ying
    Liang, Huaguo
    2018 IEEE INTERNATIONAL TEST CONFERENCE IN ASIA (ITC-ASIA 2018), 2018, : 25 - 30
  • [3] GraphQ: Scalable PIM-Based Graph Processing
    Zhuo, Youwei
    Wang, Chao
    Zhang, Mingxing
    Wang, Rui
    Niu, Dimin
    Wang, Yanzhi
    Qian, Xuehai
    MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 712 - 725
  • [4] Load Balanced PIM-Based Graph Processing
    Zhao, Xiang
    Chen, Song
    Kang, Yi
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (04)
  • [5] Personalized product recommendation based on customer value hierarchy
    Zhang, Yangming
    Qi, Jiayin
    Shu, Huaying
    Cao, Jiantong
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3657 - 3661
  • [6] Leveraging Memory PUFs and PIM-based encryption to secure edge deep learning systems
    Li, Wen
    Wang, Ying
    Li, Huawei
    Li, Xiaowei
    2019 IEEE 37TH VLSI TEST SYMPOSIUM (VTS), 2019,
  • [7] A novel PIM-based triangular element and its validations
    Huang, Zhecong
    Xue, Xinhua
    CONTINUUM MECHANICS AND THERMODYNAMICS, 2020, 32 (01) : 75 - 98
  • [8] A novel PIM-based triangular element and its validations
    Zhecong Huang
    Xinhua Xue
    Continuum Mechanics and Thermodynamics, 2020, 32 : 75 - 98
  • [9] Enabling PIM-based AES encryption for online video streaming
    Liu, Yiding
    Wang, Lening
    Qouneh, Amer
    Fu, Xin
    JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 132
  • [10] Personalized Scientific Paper Recommendation Based on Heterogeneous Graph Representation
    Ma, Xiao
    Wang, Ranran
    IEEE ACCESS, 2019, 7 : 79887 - 79894