Storage access optimization for efficient GPU-centric information retrieval

被引：0

作者：

Shrestha, Susav ^{[1
]}

Gautam, Aayush ^{[1
]}

Reddy, Narasimha ^{[1
]}

机构：

[1] Texas A&M Univ, Elect & Comp Engn, 400 Bizzell St, College Stn, TX 77843 USA

来源：

JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 04期

关键词：

GPUDirect storage; SSD optimization; AI/ML workloads; Information retrieval;

D O I：

10.1007/s11227-025-07118-9

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The rapid growth of AI/ML workloads has outpaced the capabilities of CPU-centric architectures to deliver the required data throughput and compute efficiency. This paper introduces a GPU-centric architecture leveraging GPUDirect Storage (GDS) to transfer data directly from SSDs to GPU memory, bypassing CPU bottlenecks and enabling high-throughput data paths. We propose Embedding from Storage Pipelined Network (ESPN) and its extension, ESPN-LIVE, which employ optimizations like data prefetching and on-demand embedding generation to align storage latency with GPU throughput. Experiments show ESPN reduces query latency by up to 3.9x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3.9\times$$\end{document}, cuts memory usage by up to 16x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$16\times$$\end{document}, and improves throughput by up to 68%. ESPN-LIVE eliminates the need to store multi-vector embeddings by dynamically computing document representations, reducing storage costs by up to 16x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$16\times$$\end{document}, and making it particularly effective for single-query systems. These results highlight the potential of SSD-GPU integration for scalable, high-performance AI/ML workloads in information retrieval and LLM applications.

引用

页数：18

共 50 条

[1] Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM
Potluri, Sreeram
Goswami, Anshuman
Venkata, Manjunath Gorentla
Imam, Neena
OPENSHMEM AND RELATED TECHNOLOGIES: BIG COMPUTE AND BIG DATA CONVERGENCE, OPENSHMEM 2017, 2018, 10679 : 82 - 96
[2] Toward GPU-centric Networking on Commodity Hardware
Girondi, Massimo
Scazzariello, Mariano
Maguire, Gerald Q., Jr.
Kostic, Dejan
7TH INTERNATIONAL WORKSHOP ON EDGE SYSTEMS, ANALYTICS AND NETWORKING, EDGESYS 2024, 2024, : 43 - 48
[3] Performance optimization of High-Performance LINPACK based on GPU-centric model on heterogeneous systems
Huang, Jiawen
Lu, Lu
2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1371 - 1377
[4] GPU-centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM
Potluri, S.
Goswami, A.
Rossetti, D.
Newburn, C. J.
Venkata, M. Gorentla
Imam, N.
2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2017, : 253 - 262
[5] AVIST: A GPU-Centric Design for Visual Exploration of Large Multidimensional Datasets
Mi, Peng
Sun, Maoyuan
Masiane, Moeti
Cao, Yong
North, Chris
INFORMATICS-BASEL, 2016, 3 (04):
[6] GPU-Centric Memory Tiering for LLM Serving With NVIDIA Grace Hopper Superchip
Choi, Woohyung
Jeong, Jinwoo
Jang, Hanhwi
Ahn, Jeongseob
IEEE COMPUTER ARCHITECTURE LETTERS, 2025, 24 (01) : 33 - 36
[7] MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures
Zhuang, Donglin
Zheng, Zhen
Xia, Haojun
Qiu, Xiafei
Bai, Junjie
Lin, Wei
Song, Shuaiwen Leon
PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 989 - 1005
[8] TCCL: Co-optimizing Collective Communication and Traffic Routing for GPU-centric Clusters
Li, Baojia
Wang, Xiaoliang
Wang, Jingzhu
Liu, Yifan
Gong, Yuanyuan
Lu, Hao
Dang, Weizhen
Zhang, Weifeng
Huang, Xiaojie
Chen, Mingzhuo
Chen, Jie
He, Chunzhi
Liu, Yadong
Hu, Xiaoyuan
Liu, Chen
Ji, Xuefeng
Xia, Yinben
Li, Xiang
He, Zekun
Wang, Yachen
Zou, Xianneng
PROCEEDINGS OF THE 2024 SIGCOMM WORKSHOP ON NETWORKS FOR AI COMPUTING, NAIC 2024, 2024, : 48 - 53
[9] PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems
Zhang, Yuanxing
Chen, Langshi
Yang, Siran
Yuan, Man
Yi, Huimin
Zhang, Jie
Wang, Jiamang
Dong, Jianbo
Xu, Yunlong
Song, Yue
Li, Yong
Zhang, Di
Lin, Wei
Qu, Lin
Zheng, Bo
2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 3453 - 3466
[10] Efficient storage and retrieval of probabilistic latent semantic information for information retrieval
Park, Laurence A. F.
Ramamohanarao, Kotagiri
VLDB JOURNAL, 2009, 18 (01): : 141 - 155

← 1 2 3 4 5 →