Recasting Self-Attention with Holographic Reduced Representations

被引:0
|
作者
Alam, Mohammad Mahmudul [1 ]
Raff, Edward [1 ,2 ,3 ]
Biderman, Stella [2 ,3 ,4 ]
Oates, Tim [1 ]
Holt, James [2 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Comp Sci & Elect Engn, Baltimore, MD 21228 USA
[2] Lab Phys Sci, College Pk, MD 20740 USA
[3] Booz Allen Hamilton, Mclean, VA 22102 USA
[4] EleutherAI, New York, NY USA
关键词
DETECT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the O(T-2) memory and O((TH)-H-2) compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of T >= 100, 000 are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a "Hrrformer" we obtain several benefits including O(TH logH) time complexity, O(TH) space complexity, and convergence in 10x fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to 280x faster to train on the Long Range Arena benchmark. Code is available at https: //github. com/NeuromorphicComputa tionResearchProgram/Hrrformer
引用
收藏
页码:490 / 507
页数:18
相关论文
共 50 条
  • [41] SELF-ATTENTION FOR INCOMPLETE UTTERANCE REWRITING
    Zhang, Yong
    Li, Zhitao
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8047 - 8051
  • [42] Overcoming a Theoretical Limitation of Self-Attention
    Chiang, David
    Cholak, Peter
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7654 - 7664
  • [43] Self-Attention Networks for Code Search
    Fang, Sen
    Tan, You-Shuai
    Zhang, Tao
    Liu, Yepang
    INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 134
  • [44] Applying Self-attention for Stance Classification
    Bugueno, Margarita
    Mendoza, Marcelo
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS (CIARP 2019), 2019, 11896 : 51 - 61
  • [45] Adversarial Self-Attention for Language Understanding
    Wu, Hongqiu
    Ding, Ruixue
    Zhao, Hai
    Xie, Pengjun
    Huang, Fei
    Zhang, Min
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13727 - 13735
  • [46] Cascade modeling with multihead self-attention
    Liu, Chaochao
    Wang, Wenjun
    Jiao, Pengfei
    Chen, Xue
    Sun, Yueheng
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [47] On the Expressive Flexibility of Self-Attention Matrices
    Likhosherstov, Valerii
    Choromanski, Krzysztof
    Weller, Adrian
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8773 - 8781
  • [48] A self-attention network for smoke detection
    Jiang, Minghua
    Zhao, Yaxin
    Yu, Feng
    Zhou, Changlong
    Peng, Tao
    FIRE SAFETY JOURNAL, 2022, 129
  • [49] Relevance, valence, and the self-attention network
    Mattan, Bradley D.
    Quinn, Kimberly A.
    Rotshtein, Pia
    COGNITIVE NEUROSCIENCE, 2016, 7 (1-4) : 27 - 28
  • [50] Modeling Localness for Self-Attention Networks
    Yang, Baosong
    Tu, Zhaopeng
    Wong, Derek F.
    Meng, Fandong
    Chao, Lidia S.
    Zhang, Tong
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4449 - 4458