Network Group Partition and Core Placement Optimization for Neuromorphic Multi-Core and Multi-Chip Systems

被引:0
|
作者
Yang, Yukuan [1 ,2 ]
Fan, Qihang [3 ]
Yan, Tianyi [4 ]
Pei, Jing [3 ]
Li, Guoqi [5 ,6 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China
[2] Tsinghua Univ, Ctr Brain Inspired Comp Res, Dept Precis Instrument, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Precis Instrument, Beijing 100084, Peoples R China
[4] Beijing Inst Technol, Sch Life Sci, Beijing 100081, Peoples R China
[5] Chinese Acad Sci, Inst Automat, Beijing 100045, Peoples R China
[6] Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
基金
美国国家科学基金会; 北京市自然科学基金; 国家重点研发计划; 中国国家自然科学基金;
关键词
Network group partition; core placement optimization; neuromorphic chips; multi-core and multi-chip systems; CHIP;
D O I
10.1109/TETCI.2024.3379165
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neuromorphic chips with multi-core architecture are considered to be of great potential for the next generation of artificial intelligence (AI) chips because of the avoidance of the memory wall effect. Deploying deep neural networks (DNNs) to these chips requires two stages, namely, network partition and core placement. For the network partition, existing schemes are mostly manual or only focus on single-layer, small-scale network partitions. For the core placement, to the best of our knowledge, there is still no work that has completely solved the communication deadlock problem at the clock-level which commonly exists in the applications of neuromorphic multi-core and multi-chip (NMCMC) systems. To address these issues that affect the operating and deployment efficiency of NMCMC systems, we formulate the network group partition problem as an optimization problem for the first time and propose a search-based network group partition scheme to solve the problem. A clock-level multi-chip simulator is established to completely avoid the deadlock problem during the core placement optimization process. What's more, a region constrained simulated annealing (RCSA) algorithm is proposed to improve the efficiency of the core placement optimization. Finally, an automated toolchain for the efficient deployment of DNNs in the NMCMC systems is developed by integrating the proposed network group partition and core placement schemes together. Experiments show the proposed group partition scheme can achieve 22.25%, 17.77%, 14.80% less in core number, 9.44%, 7.96%, 5.16% improvements in memory utilization, and more balanced communication and computation loads compared with existing manual schemes in ResNet-18, ResNet-34, and ResNet-50, respectively. In addition, the proposed core placement optimization based on the RCSA algorithm shows higher efficiency with much fewer optimization steps and can realize 9.52%, 11.91%, and 27.52% higher in throughput compared with sequential core placement without deadlock in the ResNet-18, ResNet-34, and ResNet-50 networks. This work paves the way for applying NMCMC systems to real-world scenarios to reach more powerful machine intelligence.
引用
收藏
页码:3966 / 3981
页数:16
相关论文
共 50 条
  • [31] A hybrid optimization approach for chip placement of multi-chip module packaging
    Yang, Ping
    Qin, Xiangnan
    MICROELECTRONICS JOURNAL, 2009, 40 (08) : 1235 - 1243
  • [32] Optimization of the SIFT Key Algorithms on Multi-Core DSP Systems
    Luo Yong
    Chen Yuanzhi
    2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, : 969 - 973
  • [33] An Automatic Task Partition Method for Multi-core System
    Jing, Minge
    Huang, Yujie
    Fan, Yibo
    Xue, Xiaoyong
    Zeng, Xiaoyang
    Yu, Zhiyi
    2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [34] Optimization of Parallel Discrete Event Simulator for Multi-core Systems
    Jagtap, Deepak
    Abu-Ghazaleh, Nael
    Ponomarev, Dmitry
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 520 - 531
  • [35] A Multi-Core/Multi-Chip Scalable Architecture of Associative Processors Employing Bell-Shaped Analog Matching Cells
    Bui, Trong Tu
    Shibata, Tadashi
    2008 9TH INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED-CIRCUIT TECHNOLOGY, VOLS 1-4, 2008, : 1811 - +
  • [36] Evaluating the Performance of Network Protocol Processing on Multi-core Systems
    Faulkner, Matthew
    Brampton, Andrew
    Pink, Stephen
    2009 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, 2009, : 16 - 23
  • [37] SpikeNC: An Accurate and Scalable Simulator for Spiking Neural Network on Multi-Core Neuromorphic Hardware
    Xie, Lisheng
    Xue, Jianwei
    Wu, Liangshun
    Chen, Faquan
    Tian, Qingyang
    Zhou, Yifan
    Ying, Rendong
    Liu, Peilin
    2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 357 - 366
  • [38] Reliability Optimization on Multi-Core Systems with Multi-Tasking and Redundant Multi-Threading
    Chen, Kuan-Hsun
    von der Brueggen, Georg
    Chen, Jian-Jia
    IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (04) : 484 - 497
  • [39] Flexible On-Line Reconfiguration of Multi-Core Neuromorphic Platforms
    Barchi, Francesco
    Urgese, Gianvito
    Siino, Alessandro
    Cataldo, Santa Di
    Macii, Enrico
    Acquaviva, Andrea
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2021, 9 (02) : 915 - 927
  • [40] Run-Time Cache-Partition Controller for Multi-Core Systems
    Danielsson, Jakob
    Jagemar, Marcus
    Behnam, Moris
    Seceleanu, Tiberiu
    Sjodin, Mikael
    45TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2019), 2019, : 4509 - 4515