RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models

被引:0
|
作者
Jeon, Yunhyeong [1 ]
Jang, Minwoo [1 ]
Lee, Hwanjun [1 ]
Jung, Yeji [1 ]
Jung, Jin [2 ]
Lee, Jonggeon [2 ]
So, Jinin [2 ]
Kim, Daehoon [3 ]
机构
[1] DGIST, Daegu 42988, South Korea
[2] Samsung Elect, Hwaseong 443743, South Korea
[3] Yonsei Univ, Seoul 03722, South Korea
关键词
Graphics processing units; Transformers; Random access memory; Kernel; Computer architecture; Natural language processing; Computational modeling; Vectors; Inverters; Encoding; Processing-in-memory; transformer model; rotary positional embedding;
D O I
10.1109/LCA.2025.3535470
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The emergence of attention-based Transformer models, such as GPT, BERT, and LLaMA, has revolutionized Natural Language Processing (NLP) by significantly improving performance across a wide range of applications. A critical factor driving these improvements is the use of positional embeddings, which are crucial for capturing the contextual relationships between tokens in a sequence. However, current positional embedding methods face challenges, particularly in managing performance overhead for long sequences and effectively capturing relationships between adjacent tokens. In response, Rotary Positional Embedding (RoPE) has emerged as a method that effectively embeds positional information with high accuracy and without necessitating model retraining even with long sequences. Despite its effectiveness, RoPE introduces a considerable performance bottleneck during inference. We observe that RoPE accounts for 61% of GPU execution time due to extensive data movement and execution dependencies. In this paper, we introduce RoPIM, a Processing-In-Memory (PIM) architecture designed to efficiently accelerate RoPE operations in Transformer models. RoPIM achieves this by utilizing a bank-level accelerator that reduces off-chip data movement through in-accelerator support for multiply-addition operations and minimizes operational dependencies via parallel data rearrangement. Additionally, RoPIM proposes an optimized data mapping strategy that leverages both bank-level and row-level mappings to enable parallel execution, eliminate bank-to-bank communication, and reduce DRAM activations. Our experimental results show that RoPIM achieves up to a 307.9x performance improvement and 914.1x energy savings compared to conventional systems.
引用
收藏
页码:41 / 44
页数:4
相关论文
共 50 条
  • [41] A Processing-in-Memory Architecture Programming Paradigm for Wireless Internet-of-Things Applications
    Yang, Xu
    Hou, Yumin
    He, Hu
    SENSORS, 2019, 19 (01)
  • [42] abstractPIM: Bridging the Gap Between Processing-In-Memory Technology and Instruction Set Architecture
    Eliahu, Adi
    Ben-Hur, Rotem
    Ronen, Ronny
    Kvatinsky, Shahar
    2020 IFIP/IEEE 28TH INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2020, : 28 - 33
  • [43] PIMFlow: Compiler and Runtime Support for CNN Models on Processing-in-Memory DRAM
    Shin, Yongwon
    Park, Juseong
    Cho, Sungjun
    Sung, Hyojin
    PROCEEDINGS OF THE 21ST ACM/IEEE INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, CGO 2023, 2023, : 249 - 262
  • [44] Minimizing Communication Conflicts in Network-On-Chip Based Processing-In-Memory Architecture
    Sun, Hanbo
    Xie, Tongxin
    Zhu, Zhenhua
    Dai, Guohao
    Yang, Huazhong
    Wang, Yu
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [45] Sky-Sorter: A Processing-in-Memory Architecture for Large-Scale Sorting
    Zokaee, Farzaneh
    Chen, Fan
    Sun, Guangyu
    Jiang, Lei
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (02) : 480 - 493
  • [46] Gibbon: Efficient Co-Exploration of NN Model and Processing-In-Memory Architecture
    Sun, Hanbo
    Wang, Chenyu
    Zhu, Zhenhua
    Ning, Xuefei
    Dai, Guohao
    Yang, Huazhong
    Wang, Yu
    PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 867 - 872
  • [47] PISA-DMA: Processing-in-Memory Instruction Set Architecture Using DMA
    Lee, Won Jun
    Kim, Chang Hyun
    Paik, Yoonah
    Kim, Seon Wook
    IEEE ACCESS, 2023, 11 : 8622 - 8632
  • [48] ReverSearch: Search-based energy-efficient Processing-in-Memory Architecture
    Li, Weihang
    Chang, Liang
    Fan, Jiajing
    Zhao, Xin
    Zhang, Hengtan
    Lin, Shuisheng
    Zhou, Jun
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 409 - 413
  • [49] Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology
    Xingqi Zou
    Sheng Xu
    Xiaoming Chen
    Liang Yan
    Yinhe Han
    Science China Information Sciences, 2021, 64
  • [50] Breaking the von Neumann bottleneck:architecture-level processing-in-memory technology
    Xingqi ZOU
    Sheng XU
    Xiaoming CHEN
    Liang YAN
    Yinhe HAN
    Science China(Information Sciences), 2021, 64 (06) : 60 - 69