An architecture for high-performance scalable shared-memory multiprocessors exploiting on-chip integration

被引:14
|
作者
Acacio, ME
González, J
García, JM
Duato, J
机构
[1] Univ Murcia, Dept Ingn & Tecnol Comp, Fac Informat, E-30071 Murcia, Spain
[2] Intel Labs Barcelona, Intel Barcelona Res Ctr, Barcelona 08034, Spain
[3] Univ Politecn Valencia, Dept Informat Sistemas & Comp, Valencia 46010, Spain
关键词
cc-NUMA multiprocessor; directory memory overhead; L2 miss latency; three-level directory; shared data cache; on-processor-chip integration;
D O I
10.1109/TPDS.2004.27
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller, the coherence hardware, and the network interface/router. In this paper, we exploit such integration scale, presenting a novel node architecture aimed at reducing the long L2 miss latencies and the memory overhead of using directories that characterize cc-NUMA machines and limit their scalability. Our proposal replaces the traditional directory with a novel three-level directory architecture, as well as it adds a small shared data cache to each of the nodes of a multiprocessor system. Due to their small size, the first-level directory and the shared data cache are integrated into the processor chip in every node, which enhances performance by saving accesses to the slower main memory. Scalability is guaranteed by having the second and third-level directories out of the processor chip and using compressed data structures. A taxonomy of the L2 misses, according to the actions performed by the directory to satisfy them, is also presented. Using execution-driven simulations, we show that significant latency reductions can be obtained by using the proposed node architecture, which translates into reductions of more than 30 percent in several cases in the application execution time.
引用
收藏
页码:755 / 768
页数:14
相关论文
共 50 条
  • [41] Exploiting on-chip data transfers for improving performance of chip-scale multiprocessors
    Chen, G
    Kandemir, M
    Kolcu, I
    Choudhary, A
    EURO-PAR 2003 PARALLEL PROCESSING, PROCEEDINGS, 2003, 2790 : 271 - 278
  • [42] THE PERFORMANCE IMPLICATIONS OF SPIN-WAITING ALTERNATIVES FOR SHARED-MEMORY MULTIPROCESSORS
    ANDERSON, TE
    PROCEEDINGS OF THE 1989 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, VOL 2: SOFTWARE, 1989, : 170 - 174
  • [43] RESOURCE CONTENTION IN SHARED-MEMORY MULTIPROCESSORS - A PARAMETERIZED PERFORMANCE DEGRADATION MODEL
    NANDA, AK
    SHING, HD
    TZEN, TH
    NI, LM
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1991, 12 (04) : 313 - 328
  • [44] Direct coherence: Bringing together performance and scalability in shared-memory multiprocessors
    Ros, Alberto
    Acacio, Manuel E.
    Garcia, Jose M.
    HIGH PERFORMANCE COMPUTING - HIPC 2007, PROCEEDINGS, 2007, 4873 : 147 - 160
  • [45] EVALUATING THE PERFORMANCE OF CACHE-AFFINITY SCHEDULING IN SHARED-MEMORY MULTIPROCESSORS
    TORRELLAS, J
    TUCKER, A
    GUPTA, A
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1995, 24 (02) : 139 - 151
  • [46] PERFORMANCE EVALUATION OF HIERARCHICAL RING-BASED SHARED-MEMORY MULTIPROCESSORS
    HOLLIDAY, M
    STUMM, M
    IEEE TRANSACTIONS ON COMPUTERS, 1994, 43 (01) : 52 - 67
  • [47] Design and performance of directory caches for scalable shared memory multiprocessors
    Michael, MM
    Nanda, AK
    FIFTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 1999, : 142 - 151
  • [48] MEMORY REFERENCING CHARACTERISTICS AND CACHING PERFORMANCE OF AND-PARALLEL PROLOG ON SHARED-MEMORY MULTIPROCESSORS
    HERMENEGILDO, M
    TICK, E
    NEW GENERATION COMPUTING, 1989, 7 (01) : 37 - 58
  • [49] High-performance and scalable on-chip digital Fourier transform spectroscopy
    Derek M. Kita
    Brando Miranda
    David Favela
    David Bono
    Jérôme Michon
    Hongtao Lin
    Tian Gu
    Juejun Hu
    Nature Communications, 9
  • [50] Scalable shared-memory architecture to solve the Knapsack 0/1 problem
    Escobar, Fernando A.
    Kolar, Anthony
    Harb, Naim
    Dos Santos, Filipe Vinci
    Valderrama, Carlos
    MICROPROCESSORS AND MICROSYSTEMS, 2017, 50 : 189 - 201