An Optimized Tensor Completion Library for multiple GPUs

被引:0
|
作者
Dun, Ming [1 ]
Li, Yunchun [1 ,2 ]
Yang, Hailong [2 ]
Sun, Qingxiao [2 ]
Luan, Zhongzhi [2 ]
Qian, Depei [2 ]
机构
[1] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China
[2] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
tensor completion; performance optimization; GPU; ALGORITHMS;
D O I
10.1145/3447818.3460692
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Tensor computations are gaining wide adoption in big data analysis and artificial intelligence. Among them, tensor completion is used to predict the missing or unobserved value in tensors. The decomposition-based tensor completion algorithms have attracted significant research attention since they exhibit better parallelization and scalability. However, existing optimization techniques for tensor completion cannot sustain the increasing demand for applying tensor completion on ever larger tensor data. To address the above limitations, we develop the first tensor completion library cuTC on multiple Graphics Processing Units (GPUs) with three widely used optimization algorithms such as alternating least squares (ALS), stochastic gradient descent (SGD) and coordinate descent (CCD+). We propose a novel TB-COO format that leverages warp shuffle and shared memory on GPU to enable efficient reduction. In addition, we adopt the auto-tuning method to determine the optimal parameters for better convergence and performance. We compare cuTC with state-of-the-art tensor completion libraries on real-world datasets, and the results show cuTC achieves significant speedup with similar or even better accuracy.
引用
收藏
页码:417 / 430
页数:14
相关论文
共 50 条
  • [1] TTLG - An Efficient Tensor Transposition Library for GPUs
    Vedurada, Jyothi
    Suresh, Arjun
    Rajam, Aravind Sukumaran
    Kim, Jinsung
    Hong, Changwan
    Panyala, Ajay
    Krishnamoorthy, Sriram
    Nandivada, V. Krishna
    Srivastava, Rohit Kumar
    Sadayappan, P.
    2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 578 - 588
  • [2] High-Performance Homomorphic Matrix Completion on Multiple GPUs
    Zhang, Tao
    Lu, Han
    Liu, Xiao-Yang
    IEEE ACCESS, 2020, 8 : 25395 - 25406
  • [3] Collective Tensor Completion with Multiple Heterogeneous Side Information
    Chen, Huiyuan
    Li, Jing
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 731 - 740
  • [4] TENSOR COMPLETION THROUGH MULTIPLE KRONECKER PRODUCT DECOMPOSITION
    Anh-Huy Phan
    Cichocki, Andrzej
    Tichavsky, Petr
    Luta, Gheorghe
    Brockmeier, Austin
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3233 - 3237
  • [5] Tensor Completion in Hierarchical Tensor Representations
    Rauhut, Holger
    Schneider, Reinhold
    Stojanac, Zeljka
    COMPRESSED SENSING AND ITS APPLICATIONS, 2015, : 419 - 450
  • [6] Smooth Tensor Product for Tensor Completion
    Wu, Tongle
    Fan, Jicong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6483 - 6496
  • [7] Generating Efficient Tensor Contractions for GPUs
    Nelson, Thomas
    Rivera, Axel
    Balaprakash, Prasanna
    Hall, Mary
    Hovland, Paul D.
    Jessup, Elizabeth
    Norris, Boyana
    2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2015, : 969 - 978
  • [8] Transparent and Optimized Distributed Processing on GPUs
    Rocha Tupinamba, Andre Luiz
    Sztajnberg, Alexandre
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (12) : 3673 - 3686
  • [9] Spline surface intersections optimized for GPUs
    Briseid, Sverre
    Dokken, Tor
    Hagen, Trond Runar
    Nygaard, Jens Olav
    COMPUTATIONAL SCIENCE - ICCS 2006, PT 4, PROCEEDINGS, 2006, 3994 : 204 - 211
  • [10] Accelerating the Lyapack library using GPUs
    Ernesto Dufrechu
    Pablo Ezzatti
    Enrique S. Quintana-Ortí
    Alfredo Remón
    The Journal of Supercomputing, 2013, 65 : 1114 - 1124