共 31 条
Enabling Scalable Chiplet-based Uniform Memory Architectures with Silicon Photonics
被引:18
|作者:
Fotouhi, Pouya
[1
]
Werner, Sebastian
[1
]
Lowe-Power, Jason
[2
]
Ben Yoo, S. J.
[1
]
机构:
[1] Univ Calif Davis, Dept Elect & Comp Engn, Davis, CA 95616 USA
[2] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
基金:
美国国家科学基金会;
关键词:
Chiplets;
Interconnects;
Memory Architecture;
Silicon Photonics;
INTERPOSER TECHNOLOGIES;
NETWORK;
D O I:
10.1145/3357526.3357564
中图分类号:
TP301 [理论、方法];
学科分类号:
081202 ;
摘要:
Chiplet-based systems have recently received much attention for scaling-up processing power in HPC systems due to their high energy efficiency and low cost manufacturing; however, large inter-chiplet NUMA latencies, distance-related energy overheads, and limited IO bandwidth caused by state-of-the-art packaging and inter-connect technologies substantially limit their scalability. Large last level caches (up to 16MiB/chiplet and 40% of chiplet area) of current systems can only temporarily hide these limitations and come at the large cost and leakage power of SRAM cells. In this paper, we propose the use of integrated silicon-photonic (SiPh) interconnects on an organic package substrate which combines low material costs with a high IO bandwidth, distance-independent energy consumption, and low-latency point-to-point interconnection fabric to effectively overcome current interconnect and packaging limitations. We exploit the properties of this fabric to propose a scalable uniform memory architecture (S-UMA) that overcomes all NUMA-related performance challenges. Moreover, we propose exploiting our low-latency SiPh fabric to remove the large LLC caches from the processor chiplets and re-integrate them into separate chiplets, increasing manufacturing yield by using smaller chiplets, allowing to use the most efficient process for SRAM circuits, or easing integration of alternative memory technologies without performance hits. Compared to state-of-the-art architectures, S-UMA offers 23% performance speed-up and 30% network power savings on average across HPC workloads for a 8-chiplet 64-core system.
引用
收藏
页码:222 / 234
页数:13
相关论文