Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy

被引:0
|
作者
Milidonis, Athanasios [1 ]
Alachiotis, Nikolaos [1 ]
Porpodas, Vasileios [1 ]
Michail, Harris [1 ]
Panagiotakopoulos, Georgios [1 ]
Kakarountas, Athanasios P. [1 ]
Goutis, Costas E. [1 ]
机构
[1] Univ Patras, VLSI Design Lab, Dept Elect & Comp Engn, Patras, Greece
关键词
Decoupled; Scratch pad;
D O I
10.1007/s11265-009-0393-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present an architecture of decoupled processors with a memory hierarchy consisting only of scratch-pad memories, and a main memory. This architecture exploits the more efficient pre-fetching of Decoupled processors, that make use of the parallelism between address computation and application data processing, which mainly exists in streaming applications. This benefit combined with the ability of scratch-pad memories to store data with no conflict misses and low energy per access contributes significantly for increasing the system's performance. The application code is split in two parallel programs the first runs on the Access processor and computes the addresses of the data in the memory hierarchy. The second processes the application data and runs on the Execute processor, a processor with a limited address space-just the register file addresses. Each transfer of any block in the memory hierarchy up to the Execute processor's register file is controlled by the Access processor and the DMA units. This strongly differentiates this architecture from traditional uniprocessors and existing decoupled processors with cache memory hierarchies. The architecture is compared in performance with uniprocessor architectures with (a) scratch-pad and (b) cache memory hierarchies and (c) the existing decoupled architectures, showing its higher normalized performance. The reason for this gain is the efficiency of data transferring that the scratch-pad memory hierarchy provides combined with the ability of the Decoupled processors to eliminate memory latency using memory management techniques for transferring data instead of fixed prefetching methods. Experimental results show that the performance is increased up to almost 2 times compared to uniprocessor architectures with scratch-pad and up to 3.7 times compared to the ones with cache. The proposed architecture achieves the above performance without having penalties in energy delay product costs.
引用
收藏
页码:281 / 296
页数:16
相关论文
共 42 条
  • [31] Data-reuse-driven energy-aware cosynthesis of scratch pad memory and hierarchical bus-based communication architecture for multiprocessor streaming applications
    Issenin, Ilya
    Brockmeyer, Erik
    Durinck, Bart
    Dutt, Nikil D.
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2008, 27 (08) : 1439 - 1452
  • [32] Memristor Based Computation-in-Memory Architecture for Data-Intensive Applications
    Hamdioui, Said
    Xie, Lei
    Hoang Anh Du Nguyen
    Taouil, Mottaqiallah
    Bertels, Koen
    Corporaal, Henk
    Jiao, Hailong
    Catthoor, Francky
    Wouters, Dirk
    Eike, Linn
    van Lunteren, Jan
    2015 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2015, : 1718 - 1725
  • [33] Improving memory hierarchy performance for irregular applications using data and computation reorderings
    Mellor-Crummey, J
    Whalley, D
    Kennedy, K
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2001, 29 (03) : 217 - 247
  • [34] Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings
    John Mellor-Crummey
    David Whalley
    Ken Kennedy
    International Journal of Parallel Programming, 2001, 29 : 217 - 247
  • [35] A Flexible and Reliable RRAM-Based In-Memory Computing Architecture for Data-Intensive Applications
    Eslami, Nima
    Moaiyeri, Mohammad Hossein
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (03) : 736 - 748
  • [36] Accelerating Incast and Multicast Traffic Delivery for Data-intensive Applications using Physical Layer Optics
    Samadi, Payman
    Gupta, Varun
    Birand, Berk
    Wang, Howard
    Zussman, Gil
    Bergman, Keren
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2014, 44 (04) : 373 - 374
  • [37] Accelerating Incast and Multicast Traffic Delivery for Data-intensive Applications using Physical Layer Optics
    Samadi, Payman
    Gupta, Varun
    Birand, Berk
    Wang, Howard
    Zussman, Gil
    Bergman, Keren
    SIGCOMM'14: PROCEEDINGS OF THE 2014 ACM CONFERENCE ON SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2014, : 373 - 374
  • [38] CiM3D: Comparator-in-Memory Designs Using Monolithic 3-D Technology for Accelerating Data-Intensive Applications
    Ramanathan, Akshay Krishna
    Rangachar, Srivatsa Srinivasa
    Govindarajan, Hariram Thirucherai
    Hung, Je-Min
    Lee, Chun-Ying
    Xue, Cheng-Xin
    Huang, Sheng-Po
    Hsueh, Fu-Kuo
    Shen, Chang-Hong
    Shieh, Jia-Min
    Yeh, Wen-Kuan
    Ho, Mon-Shu
    Sampson, Jack
    Chang, Meng-Fan
    Narayanan, Vijaykrishnan
    IEEE JOURNAL ON EXPLORATORY SOLID-STATE COMPUTATIONAL DEVICES AND CIRCUITS, 2021, 7 (01): : 79 - 87
  • [39] Novel Hybrid Computing Architecture with Memristor-Based Processing-in-Memory for Data-Intensive Applications
    Zhang, Xunming
    Zhang, Quan
    Yang, Jianguo
    Wangchen, Zedai
    Jing, Ming'e
    Wang, Mingyu
    Zeng, Xiaoyang
    Xue, Xiaoyong
    2018 14TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2018, : 1190 - 1192
  • [40] Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications
    Caulfield, Adrian M.
    Grupp, Laura M.
    Swanson, Steven
    ACM SIGPLAN NOTICES, 2009, 44 (03) : 217 - 228