Inference serving with end-to-end latency SLOs over dynamic edge networks

被引:1
|
作者
Nigade, Vinod [1 ]
Bauszat, Pablo [1 ]
Bal, Henri [1 ]
Wang, Lin [1 ]
机构
[1] Vrije Univ Amsterdam, Amsterdam, Netherlands
基金
荷兰研究理事会;
关键词
Inference serving; DNN adaptation; Data adaptation; Dynamic edge networks; Dynamic DNNs; VIDEO;
D O I
10.1007/s11241-024-09418-4
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish-a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate dynamic DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.
引用
收藏
页码:239 / 290
页数:52
相关论文
共 50 条
  • [41] End-to-End Latency Analysis in Wireless Networks with Queuing Models for General Prioritized Traffic
    Schulz, Philipp
    Ong, Lyndon
    Littlewood, Paul
    Abdullah, Bashar
    Simsek, Meryem
    Fettweis, Gerhard
    2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2019,
  • [42] End-to-End Hierarchical Fuzzy Inference Solution
    Mutlu, Begum
    Sezer, Ebru A.
    Akcayol, M. Ali
    2018 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2018,
  • [43] Dynamic Parameter Setting for End-to-End TCP Enhancement Schemes Over Mixed Wired/Wireless Networks
    Cheung, Chi-Chung
    ISWPC: 2009 4TH INTERNATIONAL SYMPOSIUM ON WIRELESS PERVASIVE COMPUTING, 2009, : 158 - 162
  • [44] End-to-end Flow Inference of Encrypted MANET
    Chang, Huijun
    Shan, Hong
    2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 1104 - 1109
  • [45] End-to-end support over Heterogeneous Wired-Wireless Networks
    Zeadally, Sherali
    Wei, Bin
    Landfeldt, Bjorn
    COMPUTER COMMUNICATIONS, 2008, 31 (11) : 2643 - 2645
  • [46] Adaptive End-to-End QoS for Multimedia over Heterogeneous Wireless Networks
    Karimi, Ouldooz Baghban
    Fathy, Mahmood
    Yousefi, Saleh
    ADVANCES IN COMPUTER SCIENCE AND ENGINEERING, 2008, 6 : 160 - 167
  • [47] End-to-End Delay Analysis of Wireless ECG over Cellular Networks
    Yoon, Man-Ki
    Kim, Jung-Eun
    Kang, Kyungtae
    Park, Kyung-Joon
    Nam, Min-Young
    Sha, Lui
    1ST ACM INTERNATIONAL WORKSHOP ON MEDICAL-GRADE WIRELESS NETWORKS, 2009, : 21 - 26
  • [48] Bandwidth Measurement and Management for End-to-End Connectivity over IP Networks
    Ravindran, K.
    Rabby, M.
    Liu, X.
    2009 FIRST INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORKS (COMSNETS 2009), 2009, : 42 - +
  • [49] Adaptive end-to-end QoS for multimedia over heterogeneous wireless networks
    Karimi, Ouldooz Baghban
    Fathy, Mahmood
    COMPUTERS & ELECTRICAL ENGINEERING, 2010, 36 (01) : 45 - 55
  • [50] TurboMouse: End-to-end Latency Compensation in Indirect Interaction
    Antoine, Axel
    Malacria, Sylvain
    Casiez, Gery
    CHI 2018: EXTENDED ABSTRACTS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2018,