Securing AI Inference in the Cloud: Is CPU-GPU Confidential Computing Ready?

被引：0

作者：

Mohan, Apoorve ^{[1
]}

Ye, Mengmei ^{[1
]}

Franke, Hubertus ^{[1
]}

Srivatsa, Mudhakar ^{[1
]}

Liu, Zhuoran ^{[1
]}

Gonzalez, Nelson Mimura ^{[1
]}

机构：

[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

2024 IEEE 17TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, CLOUD 2024 | 2024年

关键词：

confidential computing; cloud security; cloud computing; foundation models; large language models; high performance computing;

D O I：

10.1109/CLOUD62652.2024.00028

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many applications have been offloaded onto cloud environments to achieve higher agility, access to more powerful computational resources, and obtain better infrastructure management. Although cloud environments provide solid security solutions, users with highly sensitive data or regulatory compliance requirements, such as HIPAA (Health Insurance Portability and Accountability Act) and GDPR (General Data Protection Regulation), still hesitate to move such application domains to the cloud. To address these concerns, cloud service providers have started to offer solutions to protect data confidentiality and integrity through trusted execution environments (TEEs). While so far these were limited to CPU TEEs only, NVIDIA's Hopper architecture has shifted the landscape by enabling confidential computing features essential to protecting confidentiality and integrity for real-world applications offloaded to GPUs, such as large language models (LLMs). However, there lacks a sufficient study on how much performance overhead confidential computing introduces in a TEE comprised of a CPU-GPU configuration. In this paper we evaluate a confidential computing environment comprised of an Intel TDX system and NVIDIA H100 GPUs through various micro benchmarks and real workloads including BERT, LLaMA, and Granite large language models and provide discussions on the overhead incurred by confidential computing when GPUs are utilized. We show that while LLMs are sensitive to the model types and batch sizes, when larger models with pipelined processing are deployed, the performance of LLM inference in CPU-GPU TEEs can be close to par with their non-confidential setups.

引用

页码：164 / 175

页数：12

共 50 条

[1] Algorithm for Cooperative CPU-GPU Computing
Aciu, Razvan-Mihai
Ciocarlie, Horia
2013 15TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2013), 2014, : 352 - 358
[2] A survey on techniques for cooperative CPU-GPU computing
Raju, K.
Chiplunkar, Niranjan N.
SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2018, 19 : 72 - 85
[3] A Survey of CPU-GPU Heterogeneous Computing Techniques
Mittal, Sparsh
Vetter, Jeffrey S.
ACM COMPUTING SURVEYS, 2015, 47 (04)
[4] Accelerating Pattern Matching with CPU-GPU Collaborative Computing
Sanz, Victoria
Pousa, Adrian
Naiouf, Marcelo
De Giusti, Armando
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2018, PT I, 2018, 11334 : 310 - 322
[5] Boosting CUDA Applications with CPU-GPU Hybrid Computing
Lee, Changmin
Ro, Won Woo
Gaudiot, Jean-Luc
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2014, 42 (02) : 384 - 404
[6] A hybrid computing method of SpMV on CPU-GPU heterogeneous computing systems
Yang, Wangdong
Li, Kenli
Li, Keqin
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2017, 104 : 49 - 60
[7] GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors
Hestness, Joel
Keckler, Stephen W.
Wood, David A.
2015 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2015, : 87 - 97
[8] Heterogeneous Computing (CPU-GPU) for Pollution Dispersion in an Urban Environment
Fernandez, Gonzalo
Mendina, Mariana
Usera, Gabriel
COMPUTATION, 2020, 8 (01)
[9] Parallel CPU-GPU computing technique for discrete element method
Skorych, Vasyl
Dosta, Maksym
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (11):
[10] Molecular Docking Simulation Based on CPU-GPU Heterogeneous Computing
Xu, Jinyan
Li, Jianhua
Cai, Yining
ADVANCED PARALLEL PROCESSING TECHNOLOGIES, 2017, 10561 : 27 - 37

← 1 2 3 4 5 →