Recasting Self-Attention with Holographic Reduced Representations

被引:0
|
作者
Alam, Mohammad Mahmudul [1 ]
Raff, Edward [1 ,2 ,3 ]
Biderman, Stella [2 ,3 ,4 ]
Oates, Tim [1 ]
Holt, James [2 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Comp Sci & Elect Engn, Baltimore, MD 21228 USA
[2] Lab Phys Sci, College Pk, MD 20740 USA
[3] Booz Allen Hamilton, Mclean, VA 22102 USA
[4] EleutherAI, New York, NY USA
关键词
DETECT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the O(T-2) memory and O((TH)-H-2) compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of T >= 100, 000 are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a "Hrrformer" we obtain several benefits including O(TH logH) time complexity, O(TH) space complexity, and convergence in 10x fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to 280x faster to train on the Long Range Arena benchmark. Code is available at https: //github. com/NeuromorphicComputa tionResearchProgram/Hrrformer
引用
收藏
页码:490 / 507
页数:18
相关论文
共 50 条
  • [31] Relative molecule self-attention transformer
    Łukasz Maziarka
    Dawid Majchrowski
    Tomasz Danel
    Piotr Gaiński
    Jacek Tabor
    Igor Podolak
    Paweł Morkisz
    Stanisław Jastrzębski
    Journal of Cheminformatics, 16
  • [32] Self-Attention ConvLSTM for Spatiotemporal Prediction
    Lin, Zhihui
    Li, Maomao
    Zheng, Zhuobin
    Cheng, Yangyang
    Yuan, Chun
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11531 - 11538
  • [33] Pyramid Self-attention for Semantic Segmentation
    Qi, Jiyang
    Wang, Xinggang
    Hu, Yao
    Tang, Xu
    Liu, Wenyu
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 480 - 492
  • [34] Anisotropy Is Inherent to Self-Attention in Transformers
    Godey, Nathan
    de la Clergerie, Eric
    Sagot, Benoit
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 35 - 48
  • [35] Self-attention Hypergraph Pooling Network
    Zhao Y.-F.
    Jin F.-S.
    Li R.-H.
    Qin H.-C.
    Cui P.
    Wang G.-R.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (10):
  • [36] Self-Attention Based Video Summarization
    Li Y.
    Wang J.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (04): : 652 - 659
  • [37] Self-Attention Technology in Image Segmentation
    Cao, Fude
    Lu, Xueyun
    INTERNATIONAL CONFERENCE ON INTELLIGENT TRAFFIC SYSTEMS AND SMART CITY (ITSSC 2021), 2022, 12165
  • [38] Relative molecule self-attention transformer
    Maziarka, Lukasz
    Majchrowski, Dawid
    Danel, Tomasz
    Gainski, Piotr
    Tabor, Jacek
    Podolak, Igor
    Morkisz, Pawel
    Jastrzebski, Stanislaw
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
  • [39] Deformable Self-Attention for Text Classification
    Ma, Qianli
    Yan, Jiangyue
    Lin, Zhenxi
    Yu, Liuhong
    Chen, Zipeng
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1570 - 1581
  • [40] The emergence of clusters in self-attention dynamics
    Geshkovski, Borjan
    Letrouit, Cyril
    Polyanskiy, Yury
    Rigollet, Philippe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,