Threaded runtime support for execution of fine grain parallel code on coarse grain multiprocessors

被引:0
|
作者
Neves, R [1 ]
Schnabel, RB [1 ]
机构
[1] UNIV COLORADO,DEPT COMP SCI,BOULDER,CO 80309
基金
美国国家科学基金会;
关键词
D O I
10.1006/jpdc.1997.1322
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The goal of this research is to provide systems support that allows fine grain, data parallel code to execute efficiently on much coarser grain multiprocessors. The task of writing parallel applications is simplified by allowing the programmer to assume a number of processors convenient to the algorithm being implemented. This paper describes and evaluates a runtime approach that efficiently manages thousands of virtual processors per actual processor. The limits in using user-level threads as fine grain virtual processors are identified. Key techniques used are tight integration and specialization of scheduling, communication, optimized context switching, and fine-tuned stack management. A prototype of this runtime approach is evaluated by comparing implementations of three problems, a smoothing kernel of a thin-layer Navier-Stokes code, a five point stencil problem, and a block bordered system of linear equations on an Intel Paragon multiprocessor and on a network of DEC Alpha workstations. The additional cost relative to an efficient manually contracted code can be as low as 15% for granularities of 50 floating point operations per virtual processor and is typically 5-20% for granularities of about 100 floating point operations per virtual processor. The overhead is analyzed in detail to show the costs of scheduling, communication, context switching, reduced memory performance, and insuring data consistency. The implementation and analysis indicate that fine grain code can be efficiently executed on a coarse grain multiprocessor using very lightweight, specialized threads. (C) 1997 Academic Press.
引用
收藏
页码:128 / 142
页数:15
相关论文
共 50 条
  • [1] COARSE GRAIN AND FINE GRAIN IN ADDRESSING THE MIND
    Lavazza, Andrea
    EPISTEMOLOGIA, 2008, 31 (02): : 193 - 217
  • [2] Improving Support for Locality and Fine-Grain Sharing in Chip Multiprocessors
    Hossain, Hemayet
    Dwarkadas, Sandhya
    Huang, Michael C.
    PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, : 155 - 165
  • [3] Scheduling on AP/Linux for fine and coarse grain parallel processes
    Suzaki, K
    Walsh, D
    JOB SCHEDULING STRATEGIES FOR PARALLEL PROCESSING, 1999, 1659 : 111 - 128
  • [4] Architectural and Compiler Support for the Extraction and Execution of Coarse-Grain Parallelism
    Abdelrahman, Tarek S.
    2009 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 969 - 974
  • [5] A FINE-GRAIN THREADED ABSTRACT MACHINE
    VASELL, J
    PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 1994, 50 : 15 - 24
  • [6] Fine-Grain OpenMP Runtime Support with Explicit Communication Hardware Primitives
    Tendulkar, Pranav
    Papaefstathiou, Vassilis
    Nikiforos, George
    Kavadias, Stamatis
    Nikolopoulos, Dimitrios S.
    Katevenis, Manolis
    2011 DESIGN, AUTOMATION & TEST IN EUROPE (DATE), 2011, : 891 - 894
  • [7] Near fine grain parallel processing using static scheduling on single chip multiprocessors
    Kimura, K
    Kasahara, H
    INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS, 2000, : 23 - 31
  • [8] Micromechanics of fine-grain infiltration in coarse grain sands
    Chen, Fan
    Wautier, Antoine
    Philippe, Pierre
    Benahmed, Nadia
    Nicot, Francois
    ACTA GEOTECHNICA, 2024, : 1533 - 1548
  • [9] RDMA control support for fine-grain parallel computations
    Smyk, A
    Tudruj, M
    12TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS, 2004, : 208 - 215
  • [10] RDMA control support for fine-grain parallel computations
    Smyk, A
    Tudruj, M
    JOURNAL OF SYSTEMS ARCHITECTURE, 2006, 52 (02) : 117 - 128