Data Marshaling for Multi-core Architectures

被引：0

作者：

Suleman, M. Aater ^{[1
]}

Mutlu, Onur

Joao, Jose A. ^{[1
]}

Khubaib ^{[1
]}

Patt, Yale N. ^{[1
]}

机构：

[1] Univ Texas Austin, Austin, TX 78712 USA

来源：

ISCA 2010: THE 37TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE | 2010年

关键词：

Staged Execution; Critical Sections; Pipelining; CMP;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Previous research has shown that Staged Execution (SE), i.e., dividing a program into segments and executing each segment at the core that has the data and/or functionality to best run that segment, can improve performance and save power. However, SE's benefit is limited because most segments access inter-segment data, i.e., data generated by the previous segment. When consecutive segments run on different cores, accesses to inter-segment data incur cache misses, thereby reducing performance. This paper proposes Data Marshaling (DM), a new technique to eliminate cache misses to inter-segment data. DM uses profiling to identify instructions that generate inter-segment data, and adds only 96 bytes/core of storage overhead. We show that DM significantly improves the performance of two promising Staged Execution models, Accelerated Critical Sections and producer-consumer pipeline parallelism, on both homogeneous and heterogeneous multi-core systems. In both models, DM can achieve almost all of the potential of ideally eliminating cache misses to inter-segment data. DM's performance benefit increases with the number of cores.

引用

页码：441 / 450

页数：10

共 50 条

[21] Hot topic: Low power multi-core architectures
Mudge, T
Flautner, K
Martin, G
Olukotun, K
ISLPED '05: Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005, : 300 - 300
[22] Sparse matrix operations on several multi-core architectures
Carsten Trinitis
Tilman Küstner
Josef Weidendorfer
Jasmin Smajic
The Journal of Supercomputing, 2011, 57 : 132 - 140
[23] A Unified Runtime System for Heterogeneous Multi-core Architectures
Augonnet, Cedric
Namyst, Raymond
EURO-PAR 2008 WORKSHOPS - PARALLEL PROCESSING, 2009, 5415 : 174 - 183
[24] Fast recursive matrix multiplication for multi-core architectures
Ruenger, Gudula
Schwind, Michael
ICCS 2010 - INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, PROCEEDINGS, 2010, 1 (01): : 67 - 76
[25] Fast and Scalable Thread Migration for Multi-Core Architectures
Rodrigues, Miguel
Roma, Nuno
Tomas, Pedro
PROCEEDINGS IEEE/IFIP 13TH INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING 2015, 2015, : 9 - 16
[26] Security Issues of Multi-Core Architectures - The Automotive Case
Eckert, Claudia
Kittel, Thomas
IT-INFORMATION TECHNOLOGY, 2013, 55 (01): : 5 - 9
[27] Implementing matrix multiplications on the multi-core CPU Architectures
Baek, Nakhoon
Lee, Hwanyong
PROCEEDINGS OF THE 6TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED COMPUTER SCIENCE, 2007, : 433 - +
[28] StreamTMC: Stream compilation for tiled multi-core architectures
Wei, Haitao
Qin, Mingkang
Zhang, Weiwei
Yu, Junqing
Fan, Dongrui
Gao, Guang R.
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (04) : 484 - 494
[29] A Hybrid Parallel Tridiagonal Solver on Multi-core Architectures
Tang, Guangping
Li, Kenli
Li, Keqin
Chen, Hang
Du, Jiayi
PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 605 - 614
[30] High Performance Global Illumination on Multi-core Architectures
Padron, Emilio J.
Amor, Margarita
Doallo, Ramon
Boo, Montserrat
PROCEEDINGS OF THE PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2009, : 93 - +

← 1 2 3 4 5 →