A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures

被引:10
|
作者
Quintana-Orti, Gregorio [1 ]
Igual, Francisco D. [1 ]
Marques, Mercedes [1 ]
Quintana-Orti, Enrique S. [1 ]
van de Geijn, Robert A. [2 ]
机构
[1] Univ Jaume 1, Dept Ingn & Ciencia Comp, Castellon de La Plana 12071, Spain
[2] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
来源
关键词
Algorithms; Performance; High-performance; libraries; linear algebra; multithreaded architectures; out-of-core algorithms; HIGH-PERFORMANCE; COMPUTATION;
D O I
10.1145/2331130.2331133
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how the current state of hardware and software allows the programmability problem to be addressed without sacrificing performance. This comes from the realizations that memory is cheap and large, making it less necessary to optimally orchestrate I/O, and that new algorithms view matrices as collections of submatrices and computation as operations with those submatrices. This enables libraries to be coded at a high level of abstraction, leaving the tasks of scheduling the computations and data movement in the hands of a runtime system. This is in sharp contrast to more traditional approaches that leverage optimal use of in-core memory and, at the expense of introducing considerable programming complexity, explicit overlap of I/O with computation. Performance is demonstrated for this approach on multicore architectures as well as platforms equipped with hardware accelerators.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] Issues in the design of scalable out-of-core dense symmetric indefinite factorization algorithms
    Strazdins, PE
    COMPUTATIONAL SICENCE - ICCS 2003, PT III, PROCEEDINGS, 2003, 2659 : 715 - 724
  • [32] Applying out-of-core QR decomposition algorithms on FPGA-based systems
    Tai, Yi-Gang
    Lo, Chia-Tien Dan
    Psarris, Kleanthis
    2007 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2, 2007, : 86 - 91
  • [33] Join, Select, and Insert: Efficient Out-of-core Algorithms for Hierarchical Segmentation Trees
    Lefevre, Josselin
    Cousty, Jean
    Perret, Benjamin
    Phelippeau, Harold
    DISCRETE GEOMETRY AND MATHEMATICAL MORPHOLOGY, DGMM 2022, 2022, 13493 : 274 - 286
  • [34] OUT-OF-CORE SOLUTION OF LINEAR EQUATIONS WITH NON-SYMMETRIC COEFFICIENT MATRIX
    HASBANI, Y
    ENGELMAN, M
    COMPUTERS & FLUIDS, 1979, 7 (01) : 13 - 31
  • [35] A compiler driven out-of-core programming approach for optimizing data locality in loop nests
    Zhang, W
    Leiss, EL
    PDPTA'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, 2001, : 41 - 47
  • [36] OUT-OF-CORE IMPLEMENTATIONS OF CHOLESKY FACTORIZATION: LOOP-BASED VERSUS RECURSIVE ALGORITHMS
    Bereux, Natacha
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2008, 30 (04) : 1302 - 1319
  • [37] Kaleido: An Efficient Out-of-core Graph Mining System on A Single Machine
    Zhao, Cheng
    Zhang, Zhibin
    Xu, Peng
    Zheng, Tianqi
    Guo, Jiafeng
    2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 673 - 684
  • [38] DI-MMAP-a scalable memory-map runtime for out-of-core data-intensive applications
    Van Essen, Brian
    Hsieh, Henry
    Ames, Sasha
    Pearce, Roger
    Gokhale, Maya
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (01): : 15 - 28
  • [39] DI-MMAP—a scalable memory-map runtime for out-of-core data-intensive applications
    Brian Van Essen
    Henry Hsieh
    Sasha Ames
    Roger Pearce
    Maya Gokhale
    Cluster Computing, 2015, 18 : 15 - 28
  • [40] XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures
    Gautier, Thierry
    Lima, Joao V. F.
    Maillard, Nicolas
    Raffin, Bruno
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 1299 - 1308