Recasting Self-Attention with Holographic Reduced Representations

被引：0

作者：

Alam, Mohammad Mahmudul ^{[1
]}

Raff, Edward ^{[1
,2
,3
]}

Biderman, Stella ^{[2
,3
,4
]}

Oates, Tim ^{[1
]}

Holt, James ^{[2
]}

机构：

[1] Univ Maryland Baltimore Cty, Dept Comp Sci & Elect Engn, Baltimore, MD 21228 USA

[2] Lab Phys Sci, College Pk, MD 20740 USA

[3] Booz Allen Hamilton, Mclean, VA 22102 USA

[4] EleutherAI, New York, NY USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202 | 2023年 / 202卷

关键词：

DETECT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the O(T-2) memory and O((TH)-H-2) compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of T >= 100, 000 are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a "Hrrformer" we obtain several benefits including O(TH logH) time complexity, O(TH) space complexity, and convergence in 10x fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to 280x faster to train on the Long Range Arena benchmark. Code is available at https: //github. com/NeuromorphicComputa tionResearchProgram/Hrrformer

引用

页码：490 / 507

页数：18

共 50 条

[31] Relative molecule self-attention transformer
Łukasz Maziarka
Dawid Majchrowski
Tomasz Danel
Piotr Gaiński
Jacek Tabor
Igor Podolak
Paweł Morkisz
Stanisław Jastrzębski
Journal of Cheminformatics, 16
[32] Self-Attention ConvLSTM for Spatiotemporal Prediction
Lin, Zhihui
Li, Maomao
Zheng, Zhuobin
Cheng, Yangyang
Yuan, Chun
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11531 - 11538
[33] Pyramid Self-attention for Semantic Segmentation
Qi, Jiyang
Wang, Xinggang
Hu, Yao
Tang, Xu
Liu, Wenyu
PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 480 - 492
[34] Anisotropy Is Inherent to Self-Attention in Transformers
Godey, Nathan
de la Clergerie, Eric
Sagot, Benoit
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 35 - 48
[35] Self-attention Hypergraph Pooling Network
Zhao Y.-F.
Jin F.-S.
Li R.-H.
Qin H.-C.
Cui P.
Wang G.-R.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (10):
[36] Self-Attention Based Video Summarization
Li Y.
Wang J.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (04): : 652 - 659
[37] Self-Attention Technology in Image Segmentation
Cao, Fude
Lu, Xueyun
INTERNATIONAL CONFERENCE ON INTELLIGENT TRAFFIC SYSTEMS AND SMART CITY (ITSSC 2021), 2022, 12165
[38] Relative molecule self-attention transformer
Maziarka, Lukasz
Majchrowski, Dawid
Danel, Tomasz
Gainski, Piotr
Tabor, Jacek
Podolak, Igor
Morkisz, Pawel
Jastrzebski, Stanislaw
JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
[39] Deformable Self-Attention for Text Classification
Ma, Qianli
Yan, Jiangyue
Lin, Zhenxi
Yu, Liuhong
Chen, Zipeng
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1570 - 1581
[40] The emergence of clusters in self-attention dynamics
Geshkovski, Borjan
Letrouit, Cyril
Polyanskiy, Yury
Rigollet, Philippe
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →