Stealthy Black-Box Attack With Dynamic Threshold Against MARL-Based Traffic Signal Control System

被引：0

作者：

Ren, Yan ^{[1
]}

Zhang, Heng ^{[1
,2
]}

Du, Linkang ^{[3
]}

Zhang, Zhikun ^{[4
]}

Zhang, Jian ^{[2
]}

Li, Hongran ^{[2
]}

机构：

[1] Jiangsu Ocean Univ, Coll Elect Engn, Lianyungang 222000, Peoples R China

[2] Jiangsu Ocean Univ, Coll Comp Engn, Lianyungang 222000, Peoples R China

[3] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou 310000, Peoples R China

[4] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310000, Peoples R China

来源：

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS | 2024年 / 20卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Training; Perturbation methods; Heuristic algorithms; Closed box; Optimization; Control systems; Vehicle dynamics; Adversarial attack; deep reinforcement learning (DRL); defense; security; traffic signal control; ROBUSTNESS;

D O I：

10.1109/TII.2024.3413356

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multiagent reinforcement learning (MARL) promises outstanding performance for multiintersection traffic signal control systems (TSCS), enabling intelligent administration of cities. However, the vulnerability of MARL algorithms to adversarial attacks has raised concerns about the security of TSCS. In this article, we explore the robustness of MARL-based TSCS against adversarial attacks, propose a black-box multiobject attack strategy, and assign an attack budget to ensure stealthiness. We design a dynamic threshold-based selection of critical states to minimize the cumulative reward with a limited number of attacks. In addition, we present a lightweight agnostic dynamic threshold-based defense mechanism by enhancing the worst-case performance of the policy. We formulate it as a min-max optimization problem, i.e., minimizing the quantity of training sample alterations while maximizing the cumulative discount reward of policy against the perturbed states. Extensive experiments on simulation of urban mobility (SUMO) demonstrate that the proposed attack policy can significantly reduce the performance of TSCS.

引用

页码：12021 / 12031

页数：11

共 23 条

[21] Dynamic Weight-based Multi-Objective Reward Architecture for Adaptive Traffic Signal Control System
Jamil, Abu Rafe Md
Nower, Naushin
INTERNATIONAL JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS RESEARCH, 2022, 20 (02) : 495 - 507
[22] Dynamic Weight-based Multi-Objective Reward Architecture for Adaptive Traffic Signal Control System
Abu Rafe Md Jamil
Naushin Nower
International Journal of Intelligent Transportation Systems Research, 2022, 20 : 495 - 507
[23] State space black-box modelling via Markov parameters based on evolving type-2 neural-fuzzy inference system for nonlinear multivariable dynamic systems
Freitas Evangelista, Anderson Pablo
de Oliveira Serra, Ginalber Luiz
FUZZY SETS AND SYSTEMS, 2020, 394 (394) : 1 - 39

← 1 2 3 →