Securing AI Inference in the Cloud: Is CPU-GPU Confidential Computing Ready?

被引:0
|
作者
Mohan, Apoorve [1 ]
Ye, Mengmei [1 ]
Franke, Hubertus [1 ]
Srivatsa, Mudhakar [1 ]
Liu, Zhuoran [1 ]
Gonzalez, Nelson Mimura [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
confidential computing; cloud security; cloud computing; foundation models; large language models; high performance computing;
D O I
10.1109/CLOUD62652.2024.00028
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many applications have been offloaded onto cloud environments to achieve higher agility, access to more powerful computational resources, and obtain better infrastructure management. Although cloud environments provide solid security solutions, users with highly sensitive data or regulatory compliance requirements, such as HIPAA (Health Insurance Portability and Accountability Act) and GDPR (General Data Protection Regulation), still hesitate to move such application domains to the cloud. To address these concerns, cloud service providers have started to offer solutions to protect data confidentiality and integrity through trusted execution environments (TEEs). While so far these were limited to CPU TEEs only, NVIDIA's Hopper architecture has shifted the landscape by enabling confidential computing features essential to protecting confidentiality and integrity for real-world applications offloaded to GPUs, such as large language models (LLMs). However, there lacks a sufficient study on how much performance overhead confidential computing introduces in a TEE comprised of a CPU-GPU configuration. In this paper we evaluate a confidential computing environment comprised of an Intel TDX system and NVIDIA H100 GPUs through various micro benchmarks and real workloads including BERT, LLaMA, and Granite large language models and provide discussions on the overhead incurred by confidential computing when GPUs are utilized. We show that while LLMs are sensitive to the model types and batch sizes, when larger models with pipelined processing are deployed, the performance of LLM inference in CPU-GPU TEEs can be close to par with their non-confidential setups.
引用
收藏
页码:164 / 175
页数:12
相关论文
共 50 条
  • [1] Algorithm for Cooperative CPU-GPU Computing
    Aciu, Razvan-Mihai
    Ciocarlie, Horia
    2013 15TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2013), 2014, : 352 - 358
  • [2] A survey on techniques for cooperative CPU-GPU computing
    Raju, K.
    Chiplunkar, Niranjan N.
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2018, 19 : 72 - 85
  • [3] A Survey of CPU-GPU Heterogeneous Computing Techniques
    Mittal, Sparsh
    Vetter, Jeffrey S.
    ACM COMPUTING SURVEYS, 2015, 47 (04)
  • [4] Accelerating Pattern Matching with CPU-GPU Collaborative Computing
    Sanz, Victoria
    Pousa, Adrian
    Naiouf, Marcelo
    De Giusti, Armando
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2018, PT I, 2018, 11334 : 310 - 322
  • [5] Boosting CUDA Applications with CPU-GPU Hybrid Computing
    Lee, Changmin
    Ro, Won Woo
    Gaudiot, Jean-Luc
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2014, 42 (02) : 384 - 404
  • [6] A hybrid computing method of SpMV on CPU-GPU heterogeneous computing systems
    Yang, Wangdong
    Li, Kenli
    Li, Keqin
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2017, 104 : 49 - 60
  • [7] GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors
    Hestness, Joel
    Keckler, Stephen W.
    Wood, David A.
    2015 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2015, : 87 - 97
  • [8] Heterogeneous Computing (CPU-GPU) for Pollution Dispersion in an Urban Environment
    Fernandez, Gonzalo
    Mendina, Mariana
    Usera, Gabriel
    COMPUTATION, 2020, 8 (01)
  • [9] Parallel CPU-GPU computing technique for discrete element method
    Skorych, Vasyl
    Dosta, Maksym
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (11):
  • [10] Molecular Docking Simulation Based on CPU-GPU Heterogeneous Computing
    Xu, Jinyan
    Li, Jianhua
    Cai, Yining
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, 2017, 10561 : 27 - 37