Network Group Partition and Core Placement Optimization for Neuromorphic Multi-Core and Multi-Chip Systems

被引：0

作者：

Yang, Yukuan ^{[1
,2
]}

Fan, Qihang ^{[3
]}

Yan, Tianyi ^{[4
]}

Pei, Jing ^{[3
]}

Li, Guoqi ^{[5
,6
]}

机构：

[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China

[2] Tsinghua Univ, Ctr Brain Inspired Comp Res, Dept Precis Instrument, Beijing 100084, Peoples R China

[3] Tsinghua Univ, Dept Precis Instrument, Beijing 100084, Peoples R China

[4] Beijing Inst Technol, Sch Life Sci, Beijing 100081, Peoples R China

[5] Chinese Acad Sci, Inst Automat, Beijing 100045, Peoples R China

[6] Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年 / 8卷 / 06期

基金：

美国国家科学基金会; 北京市自然科学基金; 国家重点研发计划; 中国国家自然科学基金;

关键词：

Network group partition; core placement optimization; neuromorphic chips; multi-core and multi-chip systems; CHIP;

D O I：

10.1109/TETCI.2024.3379165

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Neuromorphic chips with multi-core architecture are considered to be of great potential for the next generation of artificial intelligence (AI) chips because of the avoidance of the memory wall effect. Deploying deep neural networks (DNNs) to these chips requires two stages, namely, network partition and core placement. For the network partition, existing schemes are mostly manual or only focus on single-layer, small-scale network partitions. For the core placement, to the best of our knowledge, there is still no work that has completely solved the communication deadlock problem at the clock-level which commonly exists in the applications of neuromorphic multi-core and multi-chip (NMCMC) systems. To address these issues that affect the operating and deployment efficiency of NMCMC systems, we formulate the network group partition problem as an optimization problem for the first time and propose a search-based network group partition scheme to solve the problem. A clock-level multi-chip simulator is established to completely avoid the deadlock problem during the core placement optimization process. What's more, a region constrained simulated annealing (RCSA) algorithm is proposed to improve the efficiency of the core placement optimization. Finally, an automated toolchain for the efficient deployment of DNNs in the NMCMC systems is developed by integrating the proposed network group partition and core placement schemes together. Experiments show the proposed group partition scheme can achieve 22.25%, 17.77%, 14.80% less in core number, 9.44%, 7.96%, 5.16% improvements in memory utilization, and more balanced communication and computation loads compared with existing manual schemes in ResNet-18, ResNet-34, and ResNet-50, respectively. In addition, the proposed core placement optimization based on the RCSA algorithm shows higher efficiency with much fewer optimization steps and can realize 9.52%, 11.91%, and 27.52% higher in throughput compared with sequential core placement without deadlock in the ResNet-18, ResNet-34, and ResNet-50 networks. This work paves the way for applying NMCMC systems to real-world scenarios to reach more powerful machine intelligence.

引用

页码：3966 / 3981

页数：16

共 50 条

[21] On the Cooperative Relaying Strategies for Multi-Core Wireless Network-on-Chip
Vien, Quoc-Tuan
Agyeman, Michael Opoku
Tatipamula, Mallik
Nguyen, Huan X.
IEEE ACCESS, 2021, 9 : 9572 - 9583
[22] A FAST MULTI-CORE VIRTUAL PLATFORM FOR PERFORMANCE EVALUATION OF NETWORK ON CHIP
Ma, Xichao
Zhou, Haijie
Wang, Zongyan
Yu, Zhiyi
Zeng, Xiaoyang
2014 12TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2014,
[23] Buffers optimization for multi-core decoders
Boutillon, Emmanuel
Marchand, Cedric
2023 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC, 2023,
[24] Work-in-Progress: Impact of Graph Partitioning on SNN Placement for a Multi-Core Neuromorphic Architecture
Barchi, Francesco
Urgese, Gianvito
Macii, Enrico
Acquaviva, Andrea
2018 INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURES AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES), 2018,
[25] XML Multi-core Query Optimization Based on Task Preemption and Data Partition
Tian, Pingfang
Luo, Dan
Li, Yaoyao
Gu, Jinguang
SEMANTIC TECHNOLOGY, 2014, 8388 : 294 - 305
[26] ParSA: Parallel Simulated Annealing Placement Algorithm for Multi-core Systems
Sanjabi, Mercedeh
Jahanian, Ali
Amanollahi, Saba
Miralaei, Negar
2012 16TH CSI INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND DIGITAL SYSTEMS (CADS), 2012, : 19 - 24
[27] Efficiently Scheduling Multi-core Guest Virtual Machines on Multi-core Hosts in Network Simulation
Yoginath, Srikanth B.
Perumalla, Kalyan S.
2011 IEEE WORKSHOP ON PRINCIPLES OF ADVANCED AND DISTRIBUTED SIMULATION (PADS), 2011,
[28] A scalable and fault-tolerant network routing scheme for many-core and multi-chip systems
Tsai, Wen-Chung
Chu, Kuo-Chih
Hu, Yu-Hen
Chen, Sao-Jie
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2012, 72 (11) : 1433 - 1441
[29] Secure On-Chip Communication Architecture for Reconfigurable Multi-Core Systems
Saeed, Ahmed
Ahmadinia, Ali
Just, Mike
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2016, 25 (08)
[30] Process Synchronization in Multi-core Systems Using On-Chip Memories
Joseph, Arun
Dhanwada, Nagu R.
2014 27TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2014 13TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID 2014), 2014, : 210 - 215

← 1 2 3 4 5 →