An Optimized Tensor Completion Library for multiple GPUs

被引：0

作者：

Dun, Ming ^{[1
]}

Li, Yunchun ^{[1
,2
]}

Yang, Hailong ^{[2
]}

Sun, Qingxiao ^{[2
]}

Luan, Zhongzhi ^{[2
]}

Qian, Depei ^{[2
]}

机构：

[1] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China

[2] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2021 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2021 | 2021年

基金：

中国国家自然科学基金;

关键词：

tensor completion; performance optimization; GPU; ALGORITHMS;

D O I：

10.1145/3447818.3460692

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Tensor computations are gaining wide adoption in big data analysis and artificial intelligence. Among them, tensor completion is used to predict the missing or unobserved value in tensors. The decomposition-based tensor completion algorithms have attracted significant research attention since they exhibit better parallelization and scalability. However, existing optimization techniques for tensor completion cannot sustain the increasing demand for applying tensor completion on ever larger tensor data. To address the above limitations, we develop the first tensor completion library cuTC on multiple Graphics Processing Units (GPUs) with three widely used optimization algorithms such as alternating least squares (ALS), stochastic gradient descent (SGD) and coordinate descent (CCD+). We propose a novel TB-COO format that leverages warp shuffle and shared memory on GPU to enable efficient reduction. In addition, we adopt the auto-tuning method to determine the optimal parameters for better convergence and performance. We compare cuTC with state-of-the-art tensor completion libraries on real-world datasets, and the results show cuTC achieves significant speedup with similar or even better accuracy.

引用

页码：417 / 430

页数：14

共 50 条

[1] TTLG - An Efficient Tensor Transposition Library for GPUs
Vedurada, Jyothi
Suresh, Arjun
Rajam, Aravind Sukumaran
Kim, Jinsung
Hong, Changwan
Panyala, Ajay
Krishnamoorthy, Sriram
Nandivada, V. Krishna
Srivastava, Rohit Kumar
Sadayappan, P.
2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 578 - 588
[2] High-Performance Homomorphic Matrix Completion on Multiple GPUs
Zhang, Tao
Lu, Han
Liu, Xiao-Yang
IEEE ACCESS, 2020, 8 : 25395 - 25406
[3] Collective Tensor Completion with Multiple Heterogeneous Side Information
Chen, Huiyuan
Li, Jing
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 731 - 740
[4] TENSOR COMPLETION THROUGH MULTIPLE KRONECKER PRODUCT DECOMPOSITION
Anh-Huy Phan
Cichocki, Andrzej
Tichavsky, Petr
Luta, Gheorghe
Brockmeier, Austin
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3233 - 3237
[5] Tensor Completion in Hierarchical Tensor Representations
Rauhut, Holger
Schneider, Reinhold
Stojanac, Zeljka
COMPRESSED SENSING AND ITS APPLICATIONS, 2015, : 419 - 450
[6] Smooth Tensor Product for Tensor Completion
Wu, Tongle
Fan, Jicong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6483 - 6496
[7] Generating Efficient Tensor Contractions for GPUs
Nelson, Thomas
Rivera, Axel
Balaprakash, Prasanna
Hall, Mary
Hovland, Paul D.
Jessup, Elizabeth
Norris, Boyana
2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2015, : 969 - 978
[8] Transparent and Optimized Distributed Processing on GPUs
Rocha Tupinamba, Andre Luiz
Sztajnberg, Alexandre
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (12) : 3673 - 3686
[9] Spline surface intersections optimized for GPUs
Briseid, Sverre
Dokken, Tor
Hagen, Trond Runar
Nygaard, Jens Olav
COMPUTATIONAL SCIENCE - ICCS 2006, PT 4, PROCEEDINGS, 2006, 3994 : 204 - 211
[10] Accelerating the Lyapack library using GPUs
Ernesto Dufrechu
Pablo Ezzatti
Enrique S. Quintana-Ortí
Alfredo Remón
The Journal of Supercomputing, 2013, 65 : 1114 - 1124

← 1 2 3 4 5 →