Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design

被引:29
|
作者
Talati, Nishil [1 ]
May, Kyle [1 ,2 ]
Behroozi, Armand [1 ]
Yang, Yichen [1 ]
Kaszyk, Kuba [3 ]
Vasiladiotis, Christos [3 ]
Verma, Tarunesh [1 ]
Li, Lu [3 ]
Nguyen, Brandon [1 ]
Sun, Jiawen [3 ]
Morton, John Magnus [3 ]
Ahmadi, Agreen [1 ]
Austin, Todd [1 ]
O'Boyle, Michael [3 ]
Mahlke, Scott [1 ]
Mudge, Trevor [1 ]
Dreslinski, Ronald [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Univ Wisconsin, Madison, WI USA
[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland
基金
英国工程与自然科学研究理事会; 美国国家科学基金会;
关键词
DRAM stalls; irregular workloads; graph processing; hardware-software co-design; programming model; programmer annotations; compiler; and hardware prefetching; LINKED DATA-STRUCTURES;
D O I
10.1109/HPCA51647.2021.00061
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Irregular workloads are typically bottlenecked by the memory system. These workloads often use sparse data representations, e.g., compressed sparse row/column (CSR/CSC), to conserve space at the cost of complicated, irregular traversals. Such traversals access large volumes of data and offer little locality for caches and conventional prefetchers to exploit. This paper presents Prodigy, a low-cost hardware-software co-design solution for intelligent prefetching to improve the memory latency of several important irregular workloads. Prodigy targets irregular workloads including graph analytics, sparse linear algebra, and fluid mechanics that exhibit two specific types of data-dependent memory access patterns. Prodigy adopts a "best of both worlds" approach by using static program information from software, and dynamic run-time information from hardware. The core of the system is the Data Indirection Graph (DIG)-a proposed compact representation used to express program semantics such as the layout and memory access patterns of key data structures. The DIG representation is agnostic to a particular data structure format and is demonstrated to work with several sparse formats including CSR and CSC. Program semantics are automatically captured with a compiler pass, encoded as a DIG, and inserted into the application binary. The DIG is then used to program a low-cost hardware prefetcher to fetch data according to an irregular algorithm's data structure traversal pattern. We equip the prefetcher with a flexible prefetching algorithm that maintains timeliness by dynamically adapting its prefetch distance to an application's execution pace. We evaluate the performance, energy consumption, and transistor cost of Prodigy using a variety of algorithms from the GAP, HPCG, and NAS benchmark suites. We compare the performance of Prodigy against a non-prefetching baseline as well as state-of-the-art prefetchers. We show that by using just 0.8KB of storage, Prodigy outperforms a non-prefetching baseline by 2.6 x and saves energy by 1.6x, on average. Prodigy also outperforms modern data prefetchers by 1.5-2.3x.
引用
收藏
页码:654 / 667
页数:14
相关论文
共 50 条
  • [1] System level memory optimization for hardware-software co-design
    Danckaert, K
    Catthoor, F
    DeMan, H
    PROCEEDINGS OF THE FIFTH INTERNATIONAL WORKSHOP ON HARDWARE/SOFTWARE CODESIGN (CODES/CASHE '97), 1997, : 55 - 59
  • [2] Hardware-Software Co-Design of an In-Memory Transformer Network Accelerator
    Laguna, Ann Franchesca
    Sharifi, Mohammed Mehdi
    Kazemi, Arman
    Yin, Xunzhao
    Niemier, Michael
    Hu, X. Sharon
    FRONTIERS IN ELECTRONICS, 2022, 3
  • [3] AES Hardware-Software Co-Design in WSN
    Otero, Carlos Tadeo Ortega
    Tse, Jonathan
    Manohar, Rajit
    21ST IEEE INTERNATIONAL SYMPOSIUM ON ASYNCHRONOUS CIRCUITS AND SYSTEMS (ASYNC 2015), 2015, : 85 - 92
  • [4] Hardware-Software Co-Design of AES on FPGA
    Baskaran, Saambhavi
    Rajalakshmi, Pachamuthu
    PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 1118 - 1122
  • [5] Hardware-Software Co-Design for Decimal Multiplication
    Mian, Riaz-ul-haque
    Shintani, Michihiro
    Inoue, Michiko
    COMPUTERS, 2021, 10 (02) : 1 - 19
  • [6] HMMSim: A Simulator for Hardware-Software Co-Design of Hybrid Main Memory
    Bock, Santiago
    Childers, Bruce R.
    Melhem, Rami
    Mosse, Daniel
    2015 IEEE NON-VOLATILE MEMORY SYSTEMS AND APPLICATIONS SYMPOSIUM (NVMSA), 2015,
  • [7] HARDWARE-SOFTWARE CO-DESIGN OF EMBEDDED SYSTEMS
    WOLF, WH
    PROCEEDINGS OF THE IEEE, 1994, 82 (07) : 967 - 989
  • [8] Hardware-Software Co-Design Based Obfuscation of Hardware Accelerators
    Chakraborty, Abhishek
    Srivastava, Ankur
    2019 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2019), 2019, : 549 - 554
  • [9] Improving Performance and Energy Efficiency on OpenPower Systems Using Scalable Hardware-Software Co-design
    Puzovic, Milos
    Elisseev, Vadim
    Jordan, Kirk
    Mcdonagh, James
    Harrison, Alexander
    Sawko, Robert
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2018, 2018, 11203 : 411 - 417
  • [10] Hardware-software co-design of an iris recognition algorithm
    Lopez, M.
    Daugman, J.
    Canto, E.
    IET INFORMATION SECURITY, 2011, 5 (01) : 60 - 68