RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models

被引：0

作者：

Jeon, Yunhyeong ^{[1
]}

Jang, Minwoo ^{[1
]}

Lee, Hwanjun ^{[1
]}

Jung, Yeji ^{[1
]}

Jung, Jin ^{[2
]}

Lee, Jonggeon ^{[2
]}

So, Jinin ^{[2
]}

Kim, Daehoon ^{[3
]}

机构：

[1] DGIST, Daegu 42988, South Korea

[2] Samsung Elect, Hwaseong 443743, South Korea

[3] Yonsei Univ, Seoul 03722, South Korea

来源：

IEEE COMPUTER ARCHITECTURE LETTERS | 2025年 / 24卷 / 01期

关键词：

Graphics processing units; Transformers; Random access memory; Kernel; Computer architecture; Natural language processing; Computational modeling; Vectors; Inverters; Encoding; Processing-in-memory; transformer model; rotary positional embedding;

D O I：

10.1109/LCA.2025.3535470

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The emergence of attention-based Transformer models, such as GPT, BERT, and LLaMA, has revolutionized Natural Language Processing (NLP) by significantly improving performance across a wide range of applications. A critical factor driving these improvements is the use of positional embeddings, which are crucial for capturing the contextual relationships between tokens in a sequence. However, current positional embedding methods face challenges, particularly in managing performance overhead for long sequences and effectively capturing relationships between adjacent tokens. In response, Rotary Positional Embedding (RoPE) has emerged as a method that effectively embeds positional information with high accuracy and without necessitating model retraining even with long sequences. Despite its effectiveness, RoPE introduces a considerable performance bottleneck during inference. We observe that RoPE accounts for 61% of GPU execution time due to extensive data movement and execution dependencies. In this paper, we introduce RoPIM, a Processing-In-Memory (PIM) architecture designed to efficiently accelerate RoPE operations in Transformer models. RoPIM achieves this by utilizing a bank-level accelerator that reduces off-chip data movement through in-accelerator support for multiply-addition operations and minimizes operational dependencies via parallel data rearrangement. Additionally, RoPIM proposes an optimized data mapping strategy that leverages both bank-level and row-level mappings to enable parallel execution, eliminate bank-to-bank communication, and reduce DRAM activations. Our experimental results show that RoPIM achieves up to a 307.9x performance improvement and 914.1x energy savings compared to conventional systems.

引用

页码：41 / 44

页数：4

共 50 条

[21] Towards Memory-Efficient Allocation of CNNs on Processing-in-Memory Architecture
Wang, Yi
Chen, Weixuan
Yang, Jing
Li, Tao
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (06) : 1428 - 1441
[22] Processing-in-Memory Architecture with Precision-Scaling for Malware Detection
Kasarapu, Sreenitha
Bavikadi, Sathwika
Dinakarrao, Sai Manoj Pudukotai
PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, VLSID 2024 AND 23RD INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, ES 2024, 2024, : 529 - 534
[23] A ReRAM-Based Processing-In-Memory Architecture for Hyperdimensional Computing
Liu, Cong
Wu, Kaibo
Liu, Haikun
Jin, Hai
Liao, Xiaofei
Duan, Zhuohui
Xu, Jiahong
Li, Huize
Zhang, Yu
Yang, Jing
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2025, 44 (02) : 512 - 524
[24] ReRAM-based Processing-in-Memory Architecture for Blockchain Platforms
Wang, Fang
Shen, Zhaoyan
Han, Lei
Shao, Zili
24TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC 2019), 2019, : 615 - 620
[25] Optimization of OLAP In-Memory Database Management Systems with Processing-In-Memory Architecture
Hosseinzadeh, Shima
Parvaresh, Amirhossein
Fey, Dietmar
ARCHITECTURE OF COMPUTING SYSTEMS, ARCS 2023, 2023, 13949 : 264 - 278
[26] Accelerating Deep Neural Networks in Processing-in-Memory Platforms: Analog or Digital Approach?
Angizi, Shaahin
He, Zhezhi
Reis, Dayane
Hu, Xiaobo Sharon
Tsai, Wilman
Lin, Shy Jay
Fan, Deliang
2019 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2019), 2019, : 198 - 203
[27] Towards Memory-Efficient Processing-in-Memory Architecture for Convolutional Neural Networks
Wang, Yi
Zhang, Mingxu
Yang, Jing
ACM SIGPLAN NOTICES, 2017, 52 (05) : 81 - 90
[28] COPPER: a combinatorial optimization problem solver with processing-in-memory architecture
Wang, Qiankun
Li, Xingchen
Wu, Bingzhe
Yang, Ke
Hu, Wei
Sun, Guangyu
Yang, Yuchao
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2023, 24 (05) : 731 - 741
[29] AR-PIM: An Adaptive-Range Processing-in-Memory Architecture
Chou, Teyuh
Garcia-Redondo, Fernando
Whatmough, Paul
Zhang, Zhengya
2023 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, ISLPED, 2023,
[30] Lattice: An ADC/DAC-less ReRAM-based Processing-In-Memory Architecture for Accelerating Deep Convolution Neural Networks
Zheng, Qilin
Wang, Zongwei
Feng, Zishun
Yan, Bonan
Cai, Yimao
Huang, Ru
Chen, Yiran
Yang, Chia-Lin
Li, Hai
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,

← 1 2 3 4 5 →