Data Marshaling for Multi-core Architectures

被引:0
|
作者
Suleman, M. Aater [1 ]
Mutlu, Onur
Joao, Jose A. [1 ]
Khubaib [1 ]
Patt, Yale N. [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
来源
ISCA 2010: THE 37TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE | 2010年
关键词
Staged Execution; Critical Sections; Pipelining; CMP;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Previous research has shown that Staged Execution (SE), i.e., dividing a program into segments and executing each segment at the core that has the data and/or functionality to best run that segment, can improve performance and save power. However, SE's benefit is limited because most segments access inter-segment data, i.e., data generated by the previous segment. When consecutive segments run on different cores, accesses to inter-segment data incur cache misses, thereby reducing performance. This paper proposes Data Marshaling (DM), a new technique to eliminate cache misses to inter-segment data. DM uses profiling to identify instructions that generate inter-segment data, and adds only 96 bytes/core of storage overhead. We show that DM significantly improves the performance of two promising Staged Execution models, Accelerated Critical Sections and producer-consumer pipeline parallelism, on both homogeneous and heterogeneous multi-core systems. In both models, DM can achieve almost all of the potential of ideally eliminating cache misses to inter-segment data. DM's performance benefit increases with the number of cores.
引用
收藏
页码:441 / 450
页数:10
相关论文
共 50 条
  • [21] Hot topic: Low power multi-core architectures
    Mudge, T
    Flautner, K
    Martin, G
    Olukotun, K
    ISLPED '05: Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005, : 300 - 300
  • [22] Sparse matrix operations on several multi-core architectures
    Carsten Trinitis
    Tilman Küstner
    Josef Weidendorfer
    Jasmin Smajic
    The Journal of Supercomputing, 2011, 57 : 132 - 140
  • [23] A Unified Runtime System for Heterogeneous Multi-core Architectures
    Augonnet, Cedric
    Namyst, Raymond
    EURO-PAR 2008 WORKSHOPS - PARALLEL PROCESSING, 2009, 5415 : 174 - 183
  • [24] Fast recursive matrix multiplication for multi-core architectures
    Ruenger, Gudula
    Schwind, Michael
    ICCS 2010 - INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, PROCEEDINGS, 2010, 1 (01): : 67 - 76
  • [25] Fast and Scalable Thread Migration for Multi-Core Architectures
    Rodrigues, Miguel
    Roma, Nuno
    Tomas, Pedro
    PROCEEDINGS IEEE/IFIP 13TH INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING 2015, 2015, : 9 - 16
  • [26] Security Issues of Multi-Core Architectures - The Automotive Case
    Eckert, Claudia
    Kittel, Thomas
    IT-INFORMATION TECHNOLOGY, 2013, 55 (01): : 5 - 9
  • [27] Implementing matrix multiplications on the multi-core CPU Architectures
    Baek, Nakhoon
    Lee, Hwanyong
    PROCEEDINGS OF THE 6TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED COMPUTER SCIENCE, 2007, : 433 - +
  • [28] StreamTMC: Stream compilation for tiled multi-core architectures
    Wei, Haitao
    Qin, Mingkang
    Zhang, Weiwei
    Yu, Junqing
    Fan, Dongrui
    Gao, Guang R.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (04) : 484 - 494
  • [29] A Hybrid Parallel Tridiagonal Solver on Multi-core Architectures
    Tang, Guangping
    Li, Kenli
    Li, Keqin
    Chen, Hang
    Du, Jiayi
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 605 - 614
  • [30] High Performance Global Illumination on Multi-core Architectures
    Padron, Emilio J.
    Amor, Margarita
    Doallo, Ramon
    Boo, Montserrat
    PROCEEDINGS OF THE PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2009, : 93 - +