RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models

被引：0

作者：

Jeon, Yunhyeong ^{[1
]}

Jang, Minwoo ^{[1
]}

Lee, Hwanjun ^{[1
]}

Jung, Yeji ^{[1
]}

Jung, Jin ^{[2
]}

Lee, Jonggeon ^{[2
]}

So, Jinin ^{[2
]}

Kim, Daehoon ^{[3
]}

机构：

[1] DGIST, Daegu 42988, South Korea

[2] Samsung Elect, Hwaseong 443743, South Korea

[3] Yonsei Univ, Seoul 03722, South Korea

来源：

IEEE COMPUTER ARCHITECTURE LETTERS | 2025年 / 24卷 / 01期

关键词：

Graphics processing units; Transformers; Random access memory; Kernel; Computer architecture; Natural language processing; Computational modeling; Vectors; Inverters; Encoding; Processing-in-memory; transformer model; rotary positional embedding;

D O I：

10.1109/LCA.2025.3535470

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The emergence of attention-based Transformer models, such as GPT, BERT, and LLaMA, has revolutionized Natural Language Processing (NLP) by significantly improving performance across a wide range of applications. A critical factor driving these improvements is the use of positional embeddings, which are crucial for capturing the contextual relationships between tokens in a sequence. However, current positional embedding methods face challenges, particularly in managing performance overhead for long sequences and effectively capturing relationships between adjacent tokens. In response, Rotary Positional Embedding (RoPE) has emerged as a method that effectively embeds positional information with high accuracy and without necessitating model retraining even with long sequences. Despite its effectiveness, RoPE introduces a considerable performance bottleneck during inference. We observe that RoPE accounts for 61% of GPU execution time due to extensive data movement and execution dependencies. In this paper, we introduce RoPIM, a Processing-In-Memory (PIM) architecture designed to efficiently accelerate RoPE operations in Transformer models. RoPIM achieves this by utilizing a bank-level accelerator that reduces off-chip data movement through in-accelerator support for multiply-addition operations and minimizes operational dependencies via parallel data rearrangement. Additionally, RoPIM proposes an optimized data mapping strategy that leverages both bank-level and row-level mappings to enable parallel execution, eliminate bank-to-bank communication, and reduce DRAM activations. Our experimental results show that RoPIM achieves up to a 307.9x performance improvement and 914.1x energy savings compared to conventional systems.

引用

页码：41 / 44

页数：4

共 50 条

[1] Accelerating Force-directed Graph Layout with Processing-in-Memory Architecture
Li, Ruihao
Song, Shuang
Wu, Qinzhe
John, Lizy K.
2020 IEEE 27TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2020), 2020, : 271 - 282
[2] RETRANSFORMER: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration
Yang, Xiaoxuan
Yan, Bonan
Li, Hai
Chen, Yiran
2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
[3] Accelerating Graph-Connected Component Computation With Emerging Processing-In-Memory Architecture
Chen, Xuhang
Wang, Xueyan
Jia, Xiaotao
Yang, Jianlei
Qu, Gang
Zhao, Weisheng
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (12) : 5333 - 5342
[4] Accelerating Sparse Attention with a Reconfigurable Non-volatile Processing-In-Memory Architecture
Zheng, Qilin
Li, Shiyu
Wang, Yitu
Li, Ziru
Chen, Yiran
Li, Hai
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[5] Runtime Support for Accelerating CNN Models on Digital DRAM Processing-in-Memory Hardware
Shin, Yongwon
Park, Juseong
Hong, Jeongmin
Sung, Hyojin
IEEE COMPUTER ARCHITECTURE LETTERS, 2022, 21 (02) : 33 - 36
[6] CPSAA: Accelerating Sparse Attention Using Crossbar-Based Processing-In-Memory Architecture
Li, Huize
Jin, Hai
Zheng, Long
Liao, Xiaofei
Huang, Yu
Liu, Cong
Xu, Jiahong
Duan, Zhuohui
Chen, Dan
Gui, Chuangyi
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (06) : 1741 - 1754
[7] Accelerating Neural Network Training with Processing-in-Memory GPU
Fei, Xiang
Han, Jianhui
Huang, Jianqiang
Zheng, Weimin
Zhang, Youhui
2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 414 - 421
[8] PIMCH: Cooperative Memory Prefetching in Processing-In-Memory Architecture
Xui, Sheng
Wang, Ying
Han, Yinhe
Li, Xiaowei
2018 23RD ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2018, : 209 - 214
[9] Accelerating Large Table Scan Using Processing-In-Memory Technology
Baumstark, Alexander
Jibril, Muhammad Attahir
Sattler, Kai-Uwe
Datenbank-Spektrum, 2023, 23 (03): : 199 - 209
[10] Accelerating CNN Training With Concurrent Execution of GPU and Processing-in-Memory
Choi, Jungwoo
Lee, Hyuk-Jae
Sohn, Kyomin
Yu, Hak-Soo
Rhee, Chae Eun
IEEE ACCESS, 2024, 12 : 160190 - 160204

← 1 2 3 4 5 →