PLAYS: Minimizing DNN Inference Latency in Serverless Edge Cloud for Artificial Intelligence of Things

被引:1
|
作者
Geng, Hongmin [1 ,2 ]
Zeng, Deze [1 ,2 ]
Li, Yuepeng [3 ]
Gu, Lin [4 ]
Chen, Quan [3 ]
Li, Peng [5 ]
机构
[1] China Univ Geosci, Engn Res Ctr Nat Resource Informat Management & Di, Minist Educ, Wuhan 430074, Peoples R China
[2] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[4] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
[5] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu 9658580, Japan
来源
IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 23期
基金
日本学术振兴会; 日本科学技术振兴机构;
关键词
Containers; Task analysis; Artificial neural networks; Internet of Things; Inference algorithms; Artificial intelligence; Computational modeling; Artificial Intelligence of Things (AIoT); distributed Deep neural network (DNN) inference; serverless edge cloud; task scheduling;
D O I
10.1109/JIOT.2024.3443289
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Thanks to the capability of fine-grained resource allocation and fast task scheduling, serverless computing has been adopted into edge cloud to accommodate various applications, e.g., deep neural network (DNN) inference for Artificial Intelligence of Things (AIoT). In serverless edge cloud, the servers are started up on-demand. However, as a container-based architecture, the inherent sequential startup feature of container imposes high affection on the DNN inference performance in serverless edge clouds. In this article, we investigate the distributed DNN inference problem in serverless edge cloud with the consideration of such characteristics, aiming to eliminate the extra container startup time cost to minimize the DNN inference latency. We formulate this problem into a nonlinear optimization form and then linearize it into an integer programming problem, which is proved as NP-hard. To tackle the computation complexity, we propose a priority-based layer scheduling (PLAYS) algorithm. Extensive experiment results verify the effectiveness and the adaptability of our PLAYS algorithm in comparison with other state-of-art algorithms under several well known DNN models.
引用
收藏
页码:37731 / 37740
页数:10
相关论文
共 50 条
  • [1] A Survey on Collaborative DNN Inference for Edge Intelligence
    Ren, Wei-Qing
    Qu, Yu-Ben
    Dong, Chao
    Jing, Yu-Qian
    Sun, Hao
    Wu, Qi-Hui
    Guo, Song
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (03) : 370 - 395
  • [2] A Survey on Collaborative DNN Inference for Edge Intelligence
    Wei-Qing Ren
    Yu-Ben Qu
    Chao Dong
    Yu-Qian Jing
    Hao Sun
    Qi-Hui Wu
    Song Guo
    Machine Intelligence Research, 2023, 20 : 370 - 395
  • [3] Minimizing Latency for Multi-DNN Inference on Resource-Limited CPU-Only Edge Devices
    Wang, Tao
    Shi, Tuo
    Liu, Xiulong
    Wang, Jianping
    Liu, Bin
    Li, Yingshu
    She, Yechao
    IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2024, : 2239 - 2248
  • [4] Resource-Efficient DNN Inference With Early Exiting in Serverless Edge Computing
    Guo, Xiaolin
    Dong, Fang
    Shen, Dian
    Huang, Zhaowu
    Zhang, Jinghui
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (05) : 3650 - 3666
  • [5] Accelerating DNN Inference by Edge-Cloud Collaboration
    Chen, Jianan
    Qi, Qi
    Wang, Jingyu
    Sun, Haifeng
    Liao, Jianxin
    2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
  • [6] Operating Latency Sensitive Applications on Public Serverless Edge Cloud Platforms
    Pelle, Istvan
    Czentye, Janos
    Doka, Janos
    Kern, Andras
    Gero, Balazs P.
    Sonkoly, Balazs
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (10): : 7954 - 7972
  • [7] Missing Value Filling Based on the Collaboration of Cloud and Edge in Artificial Intelligence of Things
    Wang, Tian
    Ke, Haoxiong
    Jolfaei, Alireza
    Wen, Sheng
    Haghighi, Mohammad Sayad
    Huang, Shuqiang
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (08) : 5394 - 5402
  • [8] FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge
    Taki, Sifat Ut
    Padmanabhan, Arthi
    Mastorakis, Spyridon
    2024 IEEE/ACM SYMPOSIUM ON EDGE COMPUTING, SEC 2024, 2024, : 98 - 109
  • [9] PArtNNer: Platform-Agnostic Adaptive Edge-Cloud DNN Partitioning for Minimizing End-to-End Latency
    Ghosh, Soumendu Kumar
    Raha, Arnab
    Raghunathan, Vijay
    Raghunathan, Anand
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (01)
  • [10] Taming Serverless Cold Start of Cloud Model Inference With Edge Computing
    Zhao, Kongyange
    Zhou, Zhi
    Jiao, Lei
    Cai, Shen
    Xu, Fei
    Chen, Xu
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (08) : 8111 - 8128