PIMPR: PIM-based Personalized Recommendation with Heterogeneous Memory Hierarchy

被引：0

作者：

Yang, Tao ^{[1
,2
]}

Ma, Hui ^{[1
]}

Zhao, Yilong ^{[1
]}

Liu, Fangxin ^{[1
]}

He, Zhezhi ^{[1
]}

Sun, Xiaoli ^{[4
]}

Jiang, Li ^{[1
,2
,3
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Shanghai Qi Zhi Inst, Shanghai, Peoples R China

[3] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China

[4] Zhejiang Inst Sci & Technol Informat, Hangzhou, Peoples R China

来源：

2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE | 2023年

基金：

中国国家自然科学基金;

关键词：

Recommendation System; PIM; Embedding; Acceleration; Architecture Design;

D O I：

10.23919/DATE56975.2023.10137249

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep learning-based personalized recommendation models (DLRMs) are dominating AI tasks in data centers. The performance bottleneck of typical DLRMs mainly lies in the memory-bounded embedding layers. Resistive Random Access Memory (ReRAM)-based Processing-in-memory (PIM) architecture is a natural fit for DLRMs thanks to its in-situ computation and high computational density. However, it remains two challenges before DLRMs fully embrace ReRAM-based PIM architectures: 1) The size of DLRM's embedding tables can reach tens of GBs, far beyond the memory capacity of typical ReRAM chips. 2) The irregular sparsity conveyed in the embedding layers is difficult to exploit in ReRAM crossbars architecture. In this paper, we present a PIM-based DLRM accelerator named PIMPR. PIMPR has a heterogeneous memory hierarchy-ReRAM crossbar-based PIM modules serve as the computing caches with high computing parallelism, while DIMM modules are able to hold the entire embedding table-leveraging the data locality of DLRM's embedding layers. Moreover, we propose a runtime strategy to skip the useless calculation induced by the sparsity and an offline strategy to balance the workload of each ReRAM crossbar. Compared to the state-of-the-art DLRM accelerator SPACE and TRiM, PIMPR achieves on average 2.02x and 1.79x speedup, 5.6x, and 5.1x energy reduction, respectively.

引用

页数：6

共 50 条

[1] Introspection in a massively parallel PIM-based architecture
Zima, HP
PARALLEL COMPUTING: SOFTWARE TECHNOLOGY, ALGORITHMS, ARCHITECTURES AND APPLICATIONS, 2004, 13 : 441 - 448
[2] MTTF-Aware Reliability Task Scheduling for PIM-based Heterogeneous Computing System
Pang, Desong
Xu, Dawen
Wang, Ying
Liang, Huaguo
2018 IEEE INTERNATIONAL TEST CONFERENCE IN ASIA (ITC-ASIA 2018), 2018, : 25 - 30
[3] GraphQ: Scalable PIM-Based Graph Processing
Zhuo, Youwei
Wang, Chao
Zhang, Mingxing
Wang, Rui
Niu, Dimin
Wang, Yanzhi
Qian, Xuehai
MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 712 - 725
[4] Load Balanced PIM-Based Graph Processing
Zhao, Xiang
Chen, Song
Kang, Yi
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (04)
[5] Personalized product recommendation based on customer value hierarchy
Zhang, Yangming
Qi, Jiayin
Shu, Huaying
Cao, Jiantong
2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3657 - 3661
[6] Leveraging Memory PUFs and PIM-based encryption to secure edge deep learning systems
Li, Wen
Wang, Ying
Li, Huawei
Li, Xiaowei
2019 IEEE 37TH VLSI TEST SYMPOSIUM (VTS), 2019,
[7] A novel PIM-based triangular element and its validations
Huang, Zhecong
Xue, Xinhua
CONTINUUM MECHANICS AND THERMODYNAMICS, 2020, 32 (01) : 75 - 98
[8] A novel PIM-based triangular element and its validations
Zhecong Huang
Xinhua Xue
Continuum Mechanics and Thermodynamics, 2020, 32 : 75 - 98
[9] Enabling PIM-based AES encryption for online video streaming
Liu, Yiding
Wang, Lening
Qouneh, Amer
Fu, Xin
JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 132
[10] Personalized Scientific Paper Recommendation Based on Heterogeneous Graph Representation
Ma, Xiao
Wang, Ranran
IEEE ACCESS, 2019, 7 : 79887 - 79894

← 1 2 3 4 5 →