Distributed DNN Inference With Fine-Grained Model Partitioning in Mobile Edge Computing Networks

被引:5
|
作者
Li, Hui [1 ]
Li, Xiuhua [1 ]
Fan, Qilin [1 ]
He, Qiang [2 ]
Wang, Xiaofei [3 ]
Leung, Victor C. M. [4 ,5 ]
机构
[1] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 400000, Peoples R China
[2] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Sch Comp Sci & Technol, Serv Comp Technol & Syst Lab,Cluster & Grid Comp L, Wuhan 430074, Hubei, Peoples R China
[3] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300072, Peoples R China
[4] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Guangdong, Peoples R China
[5] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC V6T1Z4, Canada
关键词
Delays; Multitasking; Internet of Things; Artificial neural networks; Task analysis; Computational modeling; Inference algorithms; Asynchronous advantage actor-critic; distributed DNN inference; mobile edge computing; model partitioning; multi-task learning; RESOURCE-ALLOCATION; AWARE; ACCELERATION;
D O I
10.1109/TMC.2024.3357874
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Model partitioning is a promising technique for improving the efficiency of distributed inference by executing partial deep neural network (DNN) models on edge servers (ESs) or Internet-of-Things (IoT) devices. However, due to heterogeneous resources of ESs and IoT devices in mobile edge computing (MEC) networks, it is non-trivial to guarantee the DNN inference speed to satisfy specific delay constraints. Meanwhile, many existing DNN models have a deep and complex architecture with numerous DNN blocks, which leads to a huge search space for fine-grained model partitioning. To address these challenges, we investigate distributed DNN inference with fine-grained model partitioning, with collaborations between ESs and IoT devices. We formulate the problem and propose a multi-task learning based asynchronous advantage actor-critic approach to find a competitive model partitioning policy that reduces DNN inference delay. Specifically, we combine the shared layers of actor-network and critic-network via soft parameter sharing, and expand the output layer into multiple branches to determine the model partitioning policy for each DNN block individually. Experiment results demonstrate that the proposed approach outperforms state-of-the-art approaches by reducing total inference delay, edge inference delay and local inference delay by an average of 4.76%, 10.04% and 8.03% in the considered MEC networks.
引用
收藏
页码:9060 / 9074
页数:15
相关论文
共 50 条
  • [31] A Fine-Grained Performance Model of Cloud Computing Centers
    Khazaei, Hamzeh
    Misic, Jelena
    Misic, Vojislav B.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (11) : 2138 - 2147
  • [32] A fine-grained parallel programming model for grid computing
    Yang, GW
    Wang, Q
    Wu, YW
    Huang, DZ
    2004 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING, PROCEEDINGS, 2004, : 613 - 616
  • [33] Fine-grained Program Partitioning for Security
    Huang, Zhen
    Jaeger, Trent
    Tan, Gang
    PROCEEDINGS OF THE 14TH EUROPEAN WORKSHOP ON SYSTEMS SECURITY (EUROSEC 2021), 2021, : 21 - 26
  • [34] Partitioning Techniques for Fine-grained Indexing
    Wu, Eugene
    Madden, Samuel
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 1127 - 1138
  • [35] DNN Real-Time Collaborative Inference Acceleration with Mobile Edge Computing
    Yang, Run
    Li, Yan
    He, Hui
    Zhang, Weizhe
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [36] Joint DNN partitioning and resource allocation for completion rate maximization of delay-aware DNN inference tasks in wireless powered mobile edge computing
    Xianzhong Tian
    Pengcheng Xu
    Yifan Shen
    Yuheng Shao
    Peer-to-Peer Networking and Applications, 2023, 16 (6) : 2865 - 2878
  • [37] Fine-Grained Urban Flow Inference
    Ouyang, Kun
    Liang, Yuxuan
    Liu, Ye
    Tong, Zekun
    Ruan, Sijie
    Zheng, Yu
    Rosenblum, David S.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2755 - 2770
  • [38] DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning
    Liang, Huanghuang
    Sang, Qianlong
    Hu, Chuang
    Cheng, Dazhao
    Zhou, Xiaobo
    Wang, Dan
    Bao, Wei
    Wang, Yu
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 3111 - 3125
  • [39] Joint DNN partitioning and resource allocation for completion rate maximization of delay-aware DNN inference tasks in wireless powered mobile edge computing
    Tian, Xianzhong
    Xu, Pengcheng
    Shen, Yifan
    Shao, Yuheng
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2023, 16 (06) : 2865 - 2878
  • [40] Fine-grained data access control for distributed sensor networks
    Junbeom Hur
    Wireless Networks, 2011, 17 : 1235 - 1249