PLAYS: Minimizing DNN Inference Latency in Serverless Edge Cloud for Artificial Intelligence of Things

被引：1

作者：

Geng, Hongmin ^{[1
,2
]}

Zeng, Deze ^{[1
,2
]}

Li, Yuepeng ^{[3
]}

Gu, Lin ^{[4
]}

Chen, Quan ^{[3
]}

Li, Peng ^{[5
]}

机构：

[1] China Univ Geosci, Engn Res Ctr Nat Resource Informat Management & Di, Minist Educ, Wuhan 430074, Peoples R China

[2] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China

[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China

[4] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China

[5] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu 9658580, Japan

来源：

IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 23期

基金：

日本学术振兴会; 日本科学技术振兴机构;

关键词：

Containers; Task analysis; Artificial neural networks; Internet of Things; Inference algorithms; Artificial intelligence; Computational modeling; Artificial Intelligence of Things (AIoT); distributed Deep neural network (DNN) inference; serverless edge cloud; task scheduling;

D O I：

10.1109/JIOT.2024.3443289

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Thanks to the capability of fine-grained resource allocation and fast task scheduling, serverless computing has been adopted into edge cloud to accommodate various applications, e.g., deep neural network (DNN) inference for Artificial Intelligence of Things (AIoT). In serverless edge cloud, the servers are started up on-demand. However, as a container-based architecture, the inherent sequential startup feature of container imposes high affection on the DNN inference performance in serverless edge clouds. In this article, we investigate the distributed DNN inference problem in serverless edge cloud with the consideration of such characteristics, aiming to eliminate the extra container startup time cost to minimize the DNN inference latency. We formulate this problem into a nonlinear optimization form and then linearize it into an integer programming problem, which is proved as NP-hard. To tackle the computation complexity, we propose a priority-based layer scheduling (PLAYS) algorithm. Extensive experiment results verify the effectiveness and the adaptability of our PLAYS algorithm in comparison with other state-of-art algorithms under several well known DNN models.

引用

页码：37731 / 37740

页数：10

共 50 条

[1] A Survey on Collaborative DNN Inference for Edge Intelligence
Ren, Wei-Qing
Qu, Yu-Ben
Dong, Chao
Jing, Yu-Qian
Sun, Hao
Wu, Qi-Hui
Guo, Song
MACHINE INTELLIGENCE RESEARCH, 2023, 20 (03) : 370 - 395
[2] A Survey on Collaborative DNN Inference for Edge Intelligence
Wei-Qing Ren
Yu-Ben Qu
Chao Dong
Yu-Qian Jing
Hao Sun
Qi-Hui Wu
Song Guo
Machine Intelligence Research, 2023, 20 : 370 - 395
[3] Minimizing Latency for Multi-DNN Inference on Resource-Limited CPU-Only Edge Devices
Wang, Tao
Shi, Tuo
Liu, Xiulong
Wang, Jianping
Liu, Bin
Li, Yingshu
She, Yechao
IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2024, : 2239 - 2248
[4] Resource-Efficient DNN Inference With Early Exiting in Serverless Edge Computing
Guo, Xiaolin
Dong, Fang
Shen, Dian
Huang, Zhaowu
Zhang, Jinghui
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (05) : 3650 - 3666
[5] Accelerating DNN Inference by Edge-Cloud Collaboration
Chen, Jianan
Qi, Qi
Wang, Jingyu
Sun, Haifeng
Liao, Jianxin
2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
[6] Operating Latency Sensitive Applications on Public Serverless Edge Cloud Platforms
Pelle, Istvan
Czentye, Janos
Doka, Janos
Kern, Andras
Gero, Balazs P.
Sonkoly, Balazs
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (10): : 7954 - 7972
[7] Missing Value Filling Based on the Collaboration of Cloud and Edge in Artificial Intelligence of Things
Wang, Tian
Ke, Haoxiong
Jolfaei, Alireza
Wen, Sheng
Haghighi, Mohammad Sayad
Huang, Shuqiang
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (08) : 5394 - 5402
[8] FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge
Taki, Sifat Ut
Padmanabhan, Arthi
Mastorakis, Spyridon
2024 IEEE/ACM SYMPOSIUM ON EDGE COMPUTING, SEC 2024, 2024, : 98 - 109
[9] PArtNNer: Platform-Agnostic Adaptive Edge-Cloud DNN Partitioning for Minimizing End-to-End Latency
Ghosh, Soumendu Kumar
Raha, Arnab
Raghunathan, Vijay
Raghunathan, Anand
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (01)
[10] Taming Serverless Cold Start of Cloud Model Inference With Edge Computing
Zhao, Kongyange
Zhou, Zhi
Jiao, Lei
Cai, Shen
Xu, Fei
Chen, Xu
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (08) : 8111 - 8128

← 1 2 3 4 5 →