Distributed-Memory Parallel JointNMF

被引：1

作者：

Eswar, Srinivas ^{[1
]}

Cobb, Benjamin ^{[2
]}

Hayashi, Koby ^{[2
]}

Kannan, Ramakrishnan ^{[3
]}

Ballard, Grey ^{[4
]}

Vuduc, Richard ^{[2
]}

Park, Haesun ^{[2
]}

机构：

[1] Argonne Natl Lab, Lemont, IL 60439 USA

[2] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA

[3] Oak Ridge Natl Lab, Oak Ridge, TN USA

[4] Wake Forest Univ, Dept Comp Sci, Winston Salem, NC 27101 USA

来源：

PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023 | 2023年

基金：

美国国家科学基金会; 美国能源部;

关键词：

High Performance Computing; Multimodal Inputs; Nonnegative Matrix Factorization; NONNEGATIVE MATRIX; COMMUNICATION; MPI;

D O I：

10.1145/3577193.3593733

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Joint Nonnegative Matrix Factorization (JointNMF) is a hybrid method for mining information from datasets that contain both feature and connection information. We propose distributed-memory parallelizations of three algorithms for solving the JointNMF problem based on Alternating Nonnegative Least Squares, Projected Gradient Descent, and Projected Gauss-Newton. We extend well-known communication-avoiding algorithms using a single processor grid case to our coupled case on two processor grids. We demonstrate the scalability of the algorithms on up to 960 cores (40 nodes) with 60% parallel efficiency. The more sophisticated Alternating Nonnegative Least Squares (ANLS) and Gauss-Newton variants outperform the first-order gradient descent method in reducing the objective on large-scale problems. We perform a topic modelling task on a large corpus of academic papers that consists of over 37 million paper abstracts and nearly a billion citation relationships, demonstrating the utility and scalability of the methods.

引用

页码：301 / 312

页数：12

共 50 条

[31] Representing shared data on distributed-memory parallel computers
Herley, KT
MATHEMATICAL SYSTEMS THEORY, 1996, 29 (02): : 111 - 156
[32] Parallel-vector simplex algorithm on distributed-memory computers
Old Dominion Univ, Norfolk, United States
Struct Opt, 3-4 (260-262):
[33] PARALLEL RENDERING OF VOLUMETRIC DATA SET ON DISTRIBUTED-MEMORY ARCHITECTURES
MONTANI, C
PEREGO, R
SCOPIGNO, R
CONCURRENCY-PRACTICE AND EXPERIENCE, 1993, 5 (02): : 153 - 167
[34] A framework for scalable greedy coloring on distributed-memory parallel computers
Bozdag, Doruk
Gebremedhin, Assefaw H.
Manne, Fredrik
Boman, Erik G.
Catalyurek, Umit V.
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2008, 68 (04) : 515 - 535
[35] SIMULATION OF COMPOSITIONAL RESERVOIR PHENOMENA ON A DISTRIBUTED-MEMORY PARALLEL COMPUTER
KILLOUGH, JE
BHOGESWARA, R
JOURNAL OF PETROLEUM TECHNOLOGY, 1991, 43 (11): : 1368 - 1374
[36] ASYNCHRONOUS PARALLEL ARC CONSISTENCY ALGORITHMS ON A DISTRIBUTED-MEMORY MACHINE
CONRAD, JM
AGRAWAL, DP
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1995, 24 (01) : 27 - 40
[37] EPEE - AN EIFFEL ENVIRONMENT TO PROGRAM DISTRIBUTED-MEMORY PARALLEL COMPUTERS
JEZEQUEL, JM
JOURNAL OF OBJECT-ORIENTED PROGRAMMING, 1993, 6 (02): : 48 - 54
[38] A shared- and distributed-memory parallel sparse direct solver
Gupta, A
APPLIED PARALLEL COMPUTING: STATE OF THE ART IN SCIENTIFIC COMPUTING, 2006, 3732 : 778 - 787
[39] K9 - A SIMULATOR OF DISTRIBUTED-MEMORY PARALLEL PROCESSORS
BEADLE, P
POMMERELL, C
ANNARATONE, M
PROCEEDINGS : SUPERCOMPUTING 89, 1989, : 765 - 774
[40] Compiling Affine Loop Nests for Distributed-Memory Parallel Architectures
Bondhugula, Uday
2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,

← 1 2 3 4 5 →