An Optimized Tensor Completion Library for multiple GPUs

被引：0

作者：

Dun, Ming ^{[1
]}

Li, Yunchun ^{[1
,2
]}

Yang, Hailong ^{[2
]}

Sun, Qingxiao ^{[2
]}

Luan, Zhongzhi ^{[2
]}

Qian, Depei ^{[2
]}

机构：

[1] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China

[2] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2021 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2021 | 2021年

基金：

中国国家自然科学基金;

关键词：

tensor completion; performance optimization; GPU; ALGORITHMS;

D O I：

10.1145/3447818.3460692

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Tensor computations are gaining wide adoption in big data analysis and artificial intelligence. Among them, tensor completion is used to predict the missing or unobserved value in tensors. The decomposition-based tensor completion algorithms have attracted significant research attention since they exhibit better parallelization and scalability. However, existing optimization techniques for tensor completion cannot sustain the increasing demand for applying tensor completion on ever larger tensor data. To address the above limitations, we develop the first tensor completion library cuTC on multiple Graphics Processing Units (GPUs) with three widely used optimization algorithms such as alternating least squares (ALS), stochastic gradient descent (SGD) and coordinate descent (CCD+). We propose a novel TB-COO format that leverages warp shuffle and shared memory on GPU to enable efficient reduction. In addition, we adopt the auto-tuning method to determine the optimal parameters for better convergence and performance. We compare cuTC with state-of-the-art tensor completion libraries on real-world datasets, and the results show cuTC achieves significant speedup with similar or even better accuracy.

引用

页码：417 / 430

页数：14

共 50 条

[41] Reshaped tensor nuclear norms for higher order tensor completion
Wimalawarne, Kishan
Mamitsuka, Hiroshi
MACHINE LEARNING, 2021, 110 (03) : 507 - 531
[42] A Unified Optimization Approach for Sparse Tensor Operations on GPUs
Liu, Bangtian
Wen, Chengyao
Sarwate, Anand D.
Dehnavi, Maryam Mehri
2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 47 - 57
[43] Transient Fault Detection in Tensor Cores for Modern GPUs
Hafezan, Mohammad Hassan
Atoofian, Ehsan
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (05)
[44] A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs
Li, Jiajia
Lakshminarasimhan, Mahesh
Wu, Xiaolong
Li, Ang
Olschanowsky, Catherine
Barker, Kevin
PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 403 - 404
[45] Squeeze: Efficient compact fractals for tensor core GPUs
Quezada, Felipe A.
Navarro, Cristobal A.
Hitschfeld, Nancy
Bustos, Benjamin
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 135 : 10 - 19
[46] A New Model for Tensor Completion: Smooth Convolutional Tensor Factorization
Takayama, Hiromu
Yokota, Tatsuya
IEEE ACCESS, 2023, 11 : 67526 - 67539
[47] Nonlinear Transform Induced Tensor Nuclear Norm for Tensor Completion
Li, Ben-Zheng
Zhao, Xi-Le
Ji, Teng-Yu
Zhang, Xiong-Jun
Huang, Ting-Zhu
JOURNAL OF SCIENTIFIC COMPUTING, 2022, 92 (03)
[48] A Rank-One Tensor Updating Algorithm for Tensor Completion
Yang, Yuning
Feng, Yunlong
Suykens, Johan A. K.
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (10) : 1633 - 1637
[49] Incoherent Tensor Norms and Their Applications in Higher Order Tensor Completion
Yuan, Ming
Zhang, Cun-Hui
IEEE TRANSACTIONS ON INFORMATION THEORY, 2017, 63 (10) : 6753 - 6766
[50] A non-convex tensor rank approximation for tensor completion
Ji, Teng-Yu
Huang, Ting-Zhu
Zhao, Xi-Le
Ma, Tian-Hui
Deng, Liang-Jian
APPLIED MATHEMATICAL MODELLING, 2017, 48 : 410 - 422

← 1 2 3 4 5 →