Storage access optimization for efficient GPU-centric information retrieval

被引:0
|
作者
Shrestha, Susav [1 ]
Gautam, Aayush [1 ]
Reddy, Narasimha [1 ]
机构
[1] Texas A&M Univ, Elect & Comp Engn, 400 Bizzell St, College Stn, TX 77843 USA
来源
JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 04期
关键词
GPUDirect storage; SSD optimization; AI/ML workloads; Information retrieval;
D O I
10.1007/s11227-025-07118-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid growth of AI/ML workloads has outpaced the capabilities of CPU-centric architectures to deliver the required data throughput and compute efficiency. This paper introduces a GPU-centric architecture leveraging GPUDirect Storage (GDS) to transfer data directly from SSDs to GPU memory, bypassing CPU bottlenecks and enabling high-throughput data paths. We propose Embedding from Storage Pipelined Network (ESPN) and its extension, ESPN-LIVE, which employ optimizations like data prefetching and on-demand embedding generation to align storage latency with GPU throughput. Experiments show ESPN reduces query latency by up to 3.9x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3.9\times$$\end{document}, cuts memory usage by up to 16x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$16\times$$\end{document}, and improves throughput by up to 68%. ESPN-LIVE eliminates the need to store multi-vector embeddings by dynamically computing document representations, reducing storage costs by up to 16x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$16\times$$\end{document}, and making it particularly effective for single-query systems. These results highlight the potential of SSD-GPU integration for scalable, high-performance AI/ML workloads in information retrieval and LLM applications.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM
    Potluri, Sreeram
    Goswami, Anshuman
    Venkata, Manjunath Gorentla
    Imam, Neena
    OPENSHMEM AND RELATED TECHNOLOGIES: BIG COMPUTE AND BIG DATA CONVERGENCE, OPENSHMEM 2017, 2018, 10679 : 82 - 96
  • [2] Toward GPU-centric Networking on Commodity Hardware
    Girondi, Massimo
    Scazzariello, Mariano
    Maguire, Gerald Q., Jr.
    Kostic, Dejan
    7TH INTERNATIONAL WORKSHOP ON EDGE SYSTEMS, ANALYTICS AND NETWORKING, EDGESYS 2024, 2024, : 43 - 48
  • [3] Performance optimization of High-Performance LINPACK based on GPU-centric model on heterogeneous systems
    Huang, Jiawen
    Lu, Lu
    2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1371 - 1377
  • [4] GPU-centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM
    Potluri, S.
    Goswami, A.
    Rossetti, D.
    Newburn, C. J.
    Venkata, M. Gorentla
    Imam, N.
    2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2017, : 253 - 262
  • [5] AVIST: A GPU-Centric Design for Visual Exploration of Large Multidimensional Datasets
    Mi, Peng
    Sun, Maoyuan
    Masiane, Moeti
    Cao, Yong
    North, Chris
    INFORMATICS-BASEL, 2016, 3 (04):
  • [6] GPU-Centric Memory Tiering for LLM Serving With NVIDIA Grace Hopper Superchip
    Choi, Woohyung
    Jeong, Jinwoo
    Jang, Hanhwi
    Ahn, Jeongseob
    IEEE COMPUTER ARCHITECTURE LETTERS, 2025, 24 (01) : 33 - 36
  • [7] MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures
    Zhuang, Donglin
    Zheng, Zhen
    Xia, Haojun
    Qiu, Xiafei
    Bai, Junjie
    Lin, Wei
    Song, Shuaiwen Leon
    PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 989 - 1005
  • [8] TCCL: Co-optimizing Collective Communication and Traffic Routing for GPU-centric Clusters
    Li, Baojia
    Wang, Xiaoliang
    Wang, Jingzhu
    Liu, Yifan
    Gong, Yuanyuan
    Lu, Hao
    Dang, Weizhen
    Zhang, Weifeng
    Huang, Xiaojie
    Chen, Mingzhuo
    Chen, Jie
    He, Chunzhi
    Liu, Yadong
    Hu, Xiaoyuan
    Liu, Chen
    Ji, Xuefeng
    Xia, Yinben
    Li, Xiang
    He, Zekun
    Wang, Yachen
    Zou, Xianneng
    PROCEEDINGS OF THE 2024 SIGCOMM WORKSHOP ON NETWORKS FOR AI COMPUTING, NAIC 2024, 2024, : 48 - 53
  • [9] PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems
    Zhang, Yuanxing
    Chen, Langshi
    Yang, Siran
    Yuan, Man
    Yi, Huimin
    Zhang, Jie
    Wang, Jiamang
    Dong, Jianbo
    Xu, Yunlong
    Song, Yue
    Li, Yong
    Zhang, Di
    Lin, Wei
    Qu, Lin
    Zheng, Bo
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 3453 - 3466
  • [10] Efficient storage and retrieval of probabilistic latent semantic information for information retrieval
    Park, Laurence A. F.
    Ramamohanarao, Kotagiri
    VLDB JOURNAL, 2009, 18 (01): : 141 - 155