Inference serving with end-to-end latency SLOs over dynamic edge networks

被引:1
|
作者
Nigade, Vinod [1 ]
Bauszat, Pablo [1 ]
Bal, Henri [1 ]
Wang, Lin [1 ]
机构
[1] Vrije Univ Amsterdam, Amsterdam, Netherlands
基金
荷兰研究理事会;
关键词
Inference serving; DNN adaptation; Data adaptation; Dynamic edge networks; Dynamic DNNs; VIDEO;
D O I
10.1007/s11241-024-09418-4
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish-a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate dynamic DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.
引用
收藏
页码:239 / 290
页数:52
相关论文
共 50 条
  • [31] EuQoS: End-to-End Quality of Service over Heterogeneous Networks
    Mingozzi, E.
    Stea, G.
    Callejo-Rodriguez, M. A.
    Enriquez-Gabeiras, J.
    Garcia-de-Blas, G.
    Ramon-Salquero, F. J.
    Burakowski, W.
    Beben, A.
    Sliwinski, J.
    Tarasiuk, H.
    Dugeon, O.
    Diaz, M.
    Baresse, L.
    Monteiro, E.
    COMPUTER COMMUNICATIONS, 2009, 32 (12) : 1355 - 1370
  • [32] Blocking probability evaluation of end-to-end dynamic WDM networks
    Nicolás A. Jara
    Alejandra Beghelli
    Photonic Network Communications, 2012, 24 : 29 - 38
  • [33] Blocking probability evaluation of end-to-end dynamic WDM networks
    Jara, Nicolas A.
    Beghelli, Alejandra
    PHOTONIC NETWORK COMMUNICATIONS, 2012, 24 (01) : 29 - 38
  • [34] Integrated Sensing and Communication for Edge Inference With End-to-End Multi-View Fusion
    Jin, Xibin
    Li, Guoliang
    Wang, Shuai
    Wen, Miaowen
    Xu, Chengzhong
    Poor, H. Vincent
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2024, 13 (08) : 2040 - 2044
  • [35] Towards end-to-end likelihood-free inference with convolutional neural networks
    Radev, Stefan T.
    Mertens, Ulf K.
    Voss, Andreas
    Koethe, Ullrich
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2020, 73 (01): : 23 - 43
  • [36] Reducing End-to-End Latency of Trigger-Action IoT Programs on Containerized Edge Platforms
    Zhang, Wenzhao
    Teng, Yixiao
    Gao, Yi
    Dong, Wei
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 13979 - 13990
  • [37] Latency meter: A device to measure end-to-end latency of VE systems
    Miller, D
    Bishop, G
    STEREOSCOPIC DISPLAYS AND VIRTUAL REALITY SYSTEMS IX, 2002, 4660 : 458 - 464
  • [38] Link Activity Scheduling for Minimum End-to-End Latency in Multihop Wireless Sensor Networks
    Cheng, Maggie X.
    Gong, Xuan
    Xu, Yibo
    Cai, Lin
    2011 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE (GLOBECOM 2011), 2011,
  • [39] End-To-End Memory Networks
    Sukhbaatar, Sainbayar
    Szlam, Arthur
    Weston, Jason
    Fergus, Rob
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [40] An arbitration look-ahead scheme for reducing end-to-end latency in networks on chip
    Kim, K
    Lee, SJ
    Lee, K
    Yoo, HJ
    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 2357 - 2360