Recasting Self-Attention with Holographic Reduced Representations

被引:0
|
作者
Alam, Mohammad Mahmudul [1 ]
Raff, Edward [1 ,2 ,3 ]
Biderman, Stella [2 ,3 ,4 ]
Oates, Tim [1 ]
Holt, James [2 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Comp Sci & Elect Engn, Baltimore, MD 21228 USA
[2] Lab Phys Sci, College Pk, MD 20740 USA
[3] Booz Allen Hamilton, Mclean, VA 22102 USA
[4] EleutherAI, New York, NY USA
关键词
DETECT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the O(T-2) memory and O((TH)-H-2) compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of T >= 100, 000 are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a "Hrrformer" we obtain several benefits including O(TH logH) time complexity, O(TH) space complexity, and convergence in 10x fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to 280x faster to train on the Long Range Arena benchmark. Code is available at https: //github. com/NeuromorphicComputa tionResearchProgram/Hrrformer
引用
收藏
页码:490 / 507
页数:18
相关论文
共 50 条
  • [1] Self-Attention with Structural Position Representations
    Wang, Xing
    Tu, Zhaopeng
    Wang, Longyue
    Shi, Shuming
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1403 - 1409
  • [2] Contextualized Word Representations for Self-Attention Network
    Essam, Mariam
    Eldawlatly, Seif
    Abbas, Hazem
    PROCEEDINGS OF 2018 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2018, : 116 - 121
  • [3] HOLOGRAPHIC REDUCED REPRESENTATIONS
    PLATE, TA
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (03): : 623 - 641
  • [4] CRSANet: Class Representations Self-Attention network for the segmentation of thyroid nodules
    Sun, Shiyao
    Fu, Chong
    Xu, Sen
    Wen, Yingyou
    Ma, Tao
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 91
  • [5] SHYNESS AND SELF-ATTENTION
    CROZIER, WR
    BULLETIN OF THE BRITISH PSYCHOLOGICAL SOCIETY, 1983, 36 (FEB): : A5 - A5
  • [6] Learning with Holographic Reduced Representations
    Ganesan, Ashwinkumar
    Gao, Hang
    Gandhi, Sunil
    Raff, Edward
    Oates, Tim
    Holt, James
    McLean, Mark
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
    Yamada, Ikuya
    Asai, Akari
    Shindo, Hiroyuki
    Takeda, Hideaki
    Matsumoto, Yuji
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6442 - 6454
  • [8] Attention and self-attention in random forests
    Utkin, Lev V.
    Konstantinov, Andrei V.
    Kirpichenko, Stanislav R.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2023, 12 (03) : 257 - 273
  • [9] Attention and self-attention in random forests
    Lev V. Utkin
    Andrei V. Konstantinov
    Stanislav R. Kirpichenko
    Progress in Artificial Intelligence, 2023, 12 : 257 - 273
  • [10] On the Integration of Self-Attention and Convolution
    Pan, Xuran
    Ge, Chunjiang
    Lu, Rui
    Song, Shiji
    Chen, Guanfu
    Huang, Zeyi
    Huang, Gao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 805 - 815