An Optimized Tensor Completion Library for multiple GPUs

被引:0
|
作者
Dun, Ming [1 ]
Li, Yunchun [1 ,2 ]
Yang, Hailong [2 ]
Sun, Qingxiao [2 ]
Luan, Zhongzhi [2 ]
Qian, Depei [2 ]
机构
[1] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China
[2] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
tensor completion; performance optimization; GPU; ALGORITHMS;
D O I
10.1145/3447818.3460692
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Tensor computations are gaining wide adoption in big data analysis and artificial intelligence. Among them, tensor completion is used to predict the missing or unobserved value in tensors. The decomposition-based tensor completion algorithms have attracted significant research attention since they exhibit better parallelization and scalability. However, existing optimization techniques for tensor completion cannot sustain the increasing demand for applying tensor completion on ever larger tensor data. To address the above limitations, we develop the first tensor completion library cuTC on multiple Graphics Processing Units (GPUs) with three widely used optimization algorithms such as alternating least squares (ALS), stochastic gradient descent (SGD) and coordinate descent (CCD+). We propose a novel TB-COO format that leverages warp shuffle and shared memory on GPU to enable efficient reduction. In addition, we adopt the auto-tuning method to determine the optimal parameters for better convergence and performance. We compare cuTC with state-of-the-art tensor completion libraries on real-world datasets, and the results show cuTC achieves significant speedup with similar or even better accuracy.
引用
收藏
页码:417 / 430
页数:14
相关论文
共 50 条
  • [41] Reshaped tensor nuclear norms for higher order tensor completion
    Wimalawarne, Kishan
    Mamitsuka, Hiroshi
    MACHINE LEARNING, 2021, 110 (03) : 507 - 531
  • [42] A Unified Optimization Approach for Sparse Tensor Operations on GPUs
    Liu, Bangtian
    Wen, Chengyao
    Sarwate, Anand D.
    Dehnavi, Maryam Mehri
    2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 47 - 57
  • [43] Transient Fault Detection in Tensor Cores for Modern GPUs
    Hafezan, Mohammad Hassan
    Atoofian, Ehsan
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (05)
  • [44] A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs
    Li, Jiajia
    Lakshminarasimhan, Mahesh
    Wu, Xiaolong
    Li, Ang
    Olschanowsky, Catherine
    Barker, Kevin
    PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 403 - 404
  • [45] Squeeze: Efficient compact fractals for tensor core GPUs
    Quezada, Felipe A.
    Navarro, Cristobal A.
    Hitschfeld, Nancy
    Bustos, Benjamin
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 135 : 10 - 19
  • [46] A New Model for Tensor Completion: Smooth Convolutional Tensor Factorization
    Takayama, Hiromu
    Yokota, Tatsuya
    IEEE ACCESS, 2023, 11 : 67526 - 67539
  • [47] Nonlinear Transform Induced Tensor Nuclear Norm for Tensor Completion
    Li, Ben-Zheng
    Zhao, Xi-Le
    Ji, Teng-Yu
    Zhang, Xiong-Jun
    Huang, Ting-Zhu
    JOURNAL OF SCIENTIFIC COMPUTING, 2022, 92 (03)
  • [48] A Rank-One Tensor Updating Algorithm for Tensor Completion
    Yang, Yuning
    Feng, Yunlong
    Suykens, Johan A. K.
    IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (10) : 1633 - 1637
  • [49] Incoherent Tensor Norms and Their Applications in Higher Order Tensor Completion
    Yuan, Ming
    Zhang, Cun-Hui
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2017, 63 (10) : 6753 - 6766
  • [50] A non-convex tensor rank approximation for tensor completion
    Ji, Teng-Yu
    Huang, Ting-Zhu
    Zhao, Xi-Le
    Ma, Tian-Hui
    Deng, Liang-Jian
    APPLIED MATHEMATICAL MODELLING, 2017, 48 : 410 - 422