Distributed-Memory Parallel JointNMF

被引：1

作者：

Eswar, Srinivas ^{[1
]}

Cobb, Benjamin ^{[2
]}

Hayashi, Koby ^{[2
]}

Kannan, Ramakrishnan ^{[3
]}

Ballard, Grey ^{[4
]}

Vuduc, Richard ^{[2
]}

Park, Haesun ^{[2
]}

机构：

[1] Argonne Natl Lab, Lemont, IL 60439 USA

[2] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA

[3] Oak Ridge Natl Lab, Oak Ridge, TN USA

[4] Wake Forest Univ, Dept Comp Sci, Winston Salem, NC 27101 USA

来源：

PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023 | 2023年

基金：

美国国家科学基金会; 美国能源部;

关键词：

High Performance Computing; Multimodal Inputs; Nonnegative Matrix Factorization; NONNEGATIVE MATRIX; COMMUNICATION; MPI;

D O I：

10.1145/3577193.3593733

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Joint Nonnegative Matrix Factorization (JointNMF) is a hybrid method for mining information from datasets that contain both feature and connection information. We propose distributed-memory parallelizations of three algorithms for solving the JointNMF problem based on Alternating Nonnegative Least Squares, Projected Gradient Descent, and Projected Gauss-Newton. We extend well-known communication-avoiding algorithms using a single processor grid case to our coupled case on two processor grids. We demonstrate the scalability of the algorithms on up to 960 cores (40 nodes) with 60% parallel efficiency. The more sophisticated Alternating Nonnegative Least Squares (ANLS) and Gauss-Newton variants outperform the first-order gradient descent method in reducing the objective on large-scale problems. We perform a topic modelling task on a large corpus of academic papers that consists of over 37 million paper abstracts and nearly a billion citation relationships, demonstrating the utility and scalability of the methods.

引用

页码：301 / 312

页数：12

共 50 条

[41] An interleaving transformation for parallelizing reductions for distributed-memory parallel machines
Wu, JJ
JOURNAL OF SUPERCOMPUTING, 2000, 15 (03): : 321 - 339
[42] Lifting sequential graph algorithms for distributed-memory parallel computation
Gregor, D
Lumsdaine, A
ACM SIGPLAN NOTICES, 2005, 40 (10) : 423 - 437
[43] PARALLEL MATRIX TRANSPOSE ALGORITHMS ON DISTRIBUTED-MEMORY CONCURRENT COMPUTERS
CHOI, JY
DONGARRA, JJ
WALKER, DW
PARALLEL COMPUTING, 1995, 21 (09) : 1387 - 1405
[44] An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines
Jan-Jan Wu
The Journal of Supercomputing, 2000, 15 : 321 - 339
[45] Trading replication for communication in parallel distributed-memory dense solvers
Irony, Dror
Toledo, Sivan
Parallel Processing Letters, 2002, 12 (01) : 79 - 94
[46] PERFORMANCE ANALYSIS OF DISTRIBUTED-MEMORY COMPUTERS WITH PARALLEL NODE ARCHITECTURE
IANNELLO, G
MAZZEO, A
MAZZOCCA, N
JOURNAL OF SYSTEMS AND SOFTWARE, 1995, 29 (02) : 107 - 120
[47] IMPLEMENTATION OF A PARALLEL DIRECT SCF ALGORITHM ON DISTRIBUTED-MEMORY COMPUTERS
FURLANI, TR
KING, HF
JOURNAL OF COMPUTATIONAL CHEMISTRY, 1995, 16 (01) : 91 - 104
[48] Parallel H-matrix arithmetic on distributed-memory systems
Izadi, Mohammad
COMPUTING AND VISUALIZATION IN SCIENCE, 2012, 15 (02) : 87 - 97
[49] PARALLEL SIMULATION OF MULTILAYERED NEURAL NETWORKS ON DISTRIBUTED-MEMORY MULTIPROCESSORS
YOON, H
NANG, JH
MAENG, SR
MICROPROCESSING AND MICROPROGRAMMING, 1990, 29 (03): : 185 - 195
[50] Efficient implementation of tree accumulations on distributed-memory parallel computers
Matsuzaki, Kiminori
COMPUTATIONAL SCIENCE - ICCS 2007, PT 2, PROCEEDINGS, 2007, 4488 : 609 - 616

← 1 2 3 4 5 →