Recasting Self-Attention with Holographic Reduced Representations

被引：0

作者：

Alam, Mohammad Mahmudul ^{[1
]}

Raff, Edward ^{[1
,2
,3
]}

Biderman, Stella ^{[2
,3
,4
]}

Oates, Tim ^{[1
]}

Holt, James ^{[2
]}

机构：

[1] Univ Maryland Baltimore Cty, Dept Comp Sci & Elect Engn, Baltimore, MD 21228 USA

[2] Lab Phys Sci, College Pk, MD 20740 USA

[3] Booz Allen Hamilton, Mclean, VA 22102 USA

[4] EleutherAI, New York, NY USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202 | 2023年 / 202卷

关键词：

DETECT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the O(T-2) memory and O((TH)-H-2) compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of T >= 100, 000 are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a "Hrrformer" we obtain several benefits including O(TH logH) time complexity, O(TH) space complexity, and convergence in 10x fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to 280x faster to train on the Long Range Arena benchmark. Code is available at https: //github. com/NeuromorphicComputa tionResearchProgram/Hrrformer

引用

页码：490 / 507

页数：18

共 50 条

[1] Self-Attention with Structural Position Representations
Wang, Xing
Tu, Zhaopeng
Wang, Longyue
Shi, Shuming
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1403 - 1409
[2] Contextualized Word Representations for Self-Attention Network
Essam, Mariam
Eldawlatly, Seif
Abbas, Hazem
PROCEEDINGS OF 2018 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2018, : 116 - 121
[3] HOLOGRAPHIC REDUCED REPRESENTATIONS
PLATE, TA
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (03): : 623 - 641
[4] CRSANet: Class Representations Self-Attention network for the segmentation of thyroid nodules
Sun, Shiyao
Fu, Chong
Xu, Sen
Wen, Yingyou
Ma, Tao
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 91
[5] SHYNESS AND SELF-ATTENTION
CROZIER, WR
BULLETIN OF THE BRITISH PSYCHOLOGICAL SOCIETY, 1983, 36 (FEB): : A5 - A5
[6] Learning with Holographic Reduced Representations
Ganesan, Ashwinkumar
Gao, Hang
Gandhi, Sunil
Raff, Edward
Oates, Tim
Holt, James
McLean, Mark
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[7] LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
Yamada, Ikuya
Asai, Akari
Shindo, Hiroyuki
Takeda, Hideaki
Matsumoto, Yuji
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6442 - 6454
[8] Attention and self-attention in random forests
Utkin, Lev V.
Konstantinov, Andrei V.
Kirpichenko, Stanislav R.
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2023, 12 (03) : 257 - 273
[9] Attention and self-attention in random forests
Lev V. Utkin
Andrei V. Konstantinov
Stanislav R. Kirpichenko
Progress in Artificial Intelligence, 2023, 12 : 257 - 273
[10] On the Integration of Self-Attention and Convolution
Pan, Xuran
Ge, Chunjiang
Lu, Rui
Song, Shiji
Chen, Guanfu
Huang, Zeyi
Huang, Gao
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 805 - 815

← 1 2 3 4 5 →