TSAR-ILP: Tile-Based, Synchronization-AwaRe ILP Allocating Heterogeneous Platforms for Streaming Applications

被引：0

作者：

Morais, Bruno ^{[1
]}

Zhang, Jinghan ^{[1
]}

Schirner, Gunar ^{[1
]}

机构：

[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2023年 / 42卷 / 11期

关键词：

Resource management; Synchronization; Throughput; Computer architecture; Schedules; Pipelines; Scalability; Accelerator-rich platform; design space exploration (DSE); embedded systems; hardware-software (HW/SW) co-design; system on chip; ARCHITECTURE; ALGORITHM; MODELS;

D O I：

10.1109/TCAD.2023.3274050

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic design space exploration (DSE) is key in hardware-software (HW/SW) co-design. To cope with the large design space, explorations are often heuristic-based and/or approximate yielding potentially locally optimal solutions. Without knowing the globally optimal solution, strong assertions about performance upper/lower bounds cannot be made. In contrast, integer linear programming (ILP) formulations can produce exact (optimal) solutions. Previous ILP-based formulations, however, lack support for tile-based architectures and realistic synchronization models, limiting their DSE capabilities. This work introduces a tile-based, synchronization-aware ILP (TSAR-ILP) formulation that overcomes previous limitations. With TSAR-ILP, the allocation/binding problems are introduced and formalized, attaining optimal solutions for mapping streaming applications onto template platforms. Using TSAR-ILP, this work explores a hardware accelerator-rich (HWACC-rich) platform with direct HWACC-to-HWACC communication under HW area constraints for 40 OpenVX applications. To illustrate design opportunities given by: 1) the ILP formulation and 2) direct HWACC-to-HWACC communication, this article analyzes the impact of job size. Results show that selecting smaller job sizes yields performance improvements and less area usage at the cost of slightly increased synchronization overhead. A job size reduction from 1 kB to 256 bytes gives $3.51\times $ average performance increase across 40 applications. Finally, DSE with TSAR-ILP is shown not to be prohibitive through scalability analysis using a set of 5000 synthetic applications with varying size (10-125 nodes), with 94.3% of applications successfully achieving optimal solutions under 60 s.

引用

页码：3693 / 3706

页数：14