Low-power architecture with scratch-pad memory for accelerating embedded applications with run-time reuse

被引:1
|
作者
Milidonis, A. [1 ]
Porpodas, V. [1 ]
Alachiotis, N. [1 ]
Kakarountas, A. P. [1 ]
Michail, H. [1 ]
Panagiotakopoulos, G. [1 ]
Goutis, C. E. [1 ]
机构
[1] Univ Patras, Dept Elect & Comp Engn, VLSI Design Lab, Patras, Greece
来源
关键词
D O I
10.1049/iet-cdt:20070145
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Current embedded systems are usually designed for data-dominated applications, but they have a tight energy and time budget. Scratch-pad memories are completely software-controlled memories with predictable behaviour and good performance and energy characteristics, thus they tend to become a standard feature in many embedded systems. However, their predictability is not helping if the application accesses its data dynamically, when the addresses of the accessed data depend on the application's input. In such cases, predetermining the scratch-pad content at design-time is not always possible as the compiler cannot predict the runtime input. Moreover, in this case, both data reuse and data placement in the scratch-pad are inefficient because chunks of data already stored cannot be efficently reused and combined with the runtime accessed data blocks. State-of-the art techniques copy each new data block to the scratch-pad without considering whether portions of them are already in it. Such dynamic temporal locality cannot be predicted or exploited by the compiler. The authors here present a system architecture, strongly connected to the system's scratch-pad and the processor's compiler, which is able to efficiently exploit run-time data reuse in the scratch-pad by being capable of holding valuable information, such as the exact data contents of the scratch-pad at runtime, and using it to do all the necessary operations for placing each new data block in scratch-pad. It is. ne tuned for applications with run-time reuse between rectangular data blocks. The application domain of the proposed architecture is multimedia applications with run-time reuse, certain applications with linked lists and multi-threaded applications. It operates in a time and energy-efficient manner when compared with existing scratch-pad architectures without the authors' scratch-pad accelerator engine, showing its higher normalised performance and lower normalised energy consumption. Experimental results show up to 2.5 times performance increase compared with existing scratch-pad architectures and 5 times compared with cache architectures and energy decrease up to 1.9 and 3.9 times, respectively.
引用
收藏
页码:109 / 123
页数:15
相关论文
共 42 条
  • [41] Associative memory with fully parallel nearest-Manhattan-distance search for low-power real-time single-chip applications
    Yano, Y
    Koide, T
    Mattausch, HJ
    ASP-DAC 2004: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, 2004, : 543 - 544
  • [42] A Novel Voltage-Accumulation Vector-Matrix Multiplication Architecture Using Resistor-shunted Floating Gate Flash Memory Device for Low-power and High-density Neural Network Applications
    Lin, Yu-Yu
    Lee, Feng-Min
    Lee, Ming-Hsiu
    Chen, Wei-Chen
    Lung, Hsiang-Lan
    Wang, Keh-Chung
    Lu, Chih-Yuan
    2018 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2018,