Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime

被引:2
|
作者
Yamazaki, Ichitaro [1 ]
Kurzak, Jakub [1 ]
Luszczek, Piotr [1 ]
Dongarra, Jack [1 ,2 ,3 ]
机构
[1] Univ Tennessee, Knoxville, TN 37996 USA
[2] Oak Ridge Natl Lab, Oak Ridge, TN 37831 USA
[3] Univ Manchester, Manchester M13 9PL, Lancs, England
基金
美国国家科学基金会;
关键词
systolic array; QR decomposition; multithreading; message-passing; dataflow; runtime;
D O I
10.1109/IPDPSW.2014.167
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A systolic array provides an alternative computing paradigm to the von Neuman architecture. Though its hardware implementation has failed as a paradigm to design integrated circuits in the past, we are now discovering that the systolic array as a software virtualization layer can lead to an extremely scalable execution paradigm. To demonstrate this scalability, in this paper, we design and implement a 3D virtual systolic array to compute a tile QR decomposition of a tall-and-skinny dense matrix. Our implementation is based on a state-of-the-art algorithm that factorizes a panel based on a tree-reduction. Using a runtime developed as a part of the Parallel Ultra Light Systolic Array Runtime (PULSAR) project, we demonstrate on a Cray-XT5 machine how our virtual systolic array can be mapped to a large-scale machine and obtain excellent parallel performance. This is an important contribution since such a QR decomposition is used, for example, to compute a least squares solution of an overdetermined system, which arises in many scientific and engineering problems.
引用
收藏
页码:1495 / 1504
页数:10
相关论文
共 50 条
  • [1] Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime
    Yamazaki, Ichitaro
    Kurzak, Jakub
    Luszczek, Piotr
    Dongarra, Jack
    PARALLEL PROCESSING LETTERS, 2014, 24 (04)
  • [2] Design and Optimization of Heterogeneous Tree-based FPGA using 3D Technology
    Pangracious, Vinod
    Mehrez, Habib
    Marrakchi, Zied
    PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2013, : 334 - 337
  • [3] An Efficient FPGA Implementation of QR Decomposition using a Novel Systolic Array Architecture based on Enhanced Vectoring CORDIC
    Zhang, Jianfeng
    Chow, Paul
    Liu, Hengzhu
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2014, : 123 - 130
  • [4] Design and Implementation of Virtual Interactive Scene Based on Unity 3D
    Lu, Guiping
    Xue, Guanghong
    Chen, Zong
    EQUIPMENT MANUFACTURING TECHNOLOGY AND AUTOMATION, PTS 1-3, 2011, 317-319 : 2162 - +
  • [5] Design and implementation of 3D Virtual Digital Campus - Based on Unity3D
    Jing, Xiao
    PROCEEDINGS 2016 EIGHTH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION ICMTMA 2016, 2016, : 187 - 190
  • [6] Designing a 3D Tree-based FPGA: Optimization of Butterfly Programmable Interconnect Topology Using 3D Technology
    Pangracious, Vinod
    Mehrez, Habib
    Marakchi, Zied
    2013 IEEE INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC), 2013,
  • [7] DESIGN AND OPTIMIZATION OF A HORIZONTALLY PARTITIONED, HIGH-SPEED, 3D TREE-BASED FPGA
    Pangracious, Vinod
    Marrakchi, Zied
    Mehrez, Habib
    IEEE MICRO, 2015, 35 (06) : 48 - 59
  • [8] The design and implementation of task based learning activities in 3D virtual environment
    Peng, Jianlei
    Liu, Ruobin
    Liu, Geping
    FIFTH INTERNATIONAL CONFERENCE ON EDUCATIONAL INNOVATION THROUGH TECHNOLOGY (EITT 2016), 2016, : 20 - 25
  • [9] Design and Implementation of 3D Virtual Campus Online Interaction Based on Untiy3D
    Li, Wang
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2066 - 2070
  • [10] Design and implementation of 3D Qipao display system based on virtual reality technology
    1785, Bentham Science Publishers B.V., P.O. Box 294, Bussum, 1400 AG, Netherlands (06):