Fast Monte-Carlo Approximation of the Attention Mechanism

被引：0

作者：

Kim, Hyunjun ^{[1
]}

Ko, JeongGil ^{[1
]}

机构：

[1] Yonsei Univ, Sch Integrated Technol, Seoul, South Korea

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce Monte-Carlo Attention (MCA), a randomized approximation method for reducing the computational cost of self-attention mechanisms in Transformer architectures. MCA exploits the fact that the importance of each token in an input sequence vary with respect to their attention scores; thus, some degree of error can be tolerable when encoding tokens with low attention. Using approximate matrix multiplication, MCA applies different error bounds to encode input tokens such that those with low attention scores are computed with relaxed precision, whereas errors of salient elements are minimized MCA can operate in parallel with other attention optimization schemes and does not require model modification. We study the theoretical error bounds and demonstrate that MCA reduces attention complexity (in FLOPS) for various Transformer models by up to 11 x in GLUE benchmarks without compromising model accuracy. Source code and appendix: https://github.com/eis-lab/monte-carlo-attention

引用

页码：7185 / 7193

页数：9

共 50 条

[21] COMMENT ON FAST MONTE-CARLO INTEGRATION OF PDF ESTIMATORS
KITTLER, J
JOURNAL OF CYBERNETICS, 1978, 8 (3-4): : 253 - 256
[22] ADAPTIVE SAMPLING - AN ITERATIVE FAST MONTE-CARLO PROCEDURE
BUCHER, CG
STRUCTURAL SAFETY, 1988, 5 (02) : 119 - 126
[23] FAST ALGORITHM FOR MONTE-CARLO SIMULATIONS OF SYSTEMS WITH FERMIONS
GRADY, M
PHYSICAL REVIEW D, 1985, 32 (06): : 1496 - 1502
[24] Monte-Carlo Simulation of Fast Neutron Detection with Timepix
Uher, J.
Jakubek, J.
2009 IEEE NUCLEAR SCIENCE SYMPOSIUM CONFERENCE RECORD, VOLS 1-5, 2009, : 1277 - +
[25] A QUASI-DETERMINISTIC APPROXIMATION OF THE MONTE-CARLO IMPORTANCE FUNCTION
BOOTH, TE
NUCLEAR SCIENCE AND ENGINEERING, 1990, 104 (04) : 374 - 384
[26] PIECEWISE CONSTANT APPROXIMATION FOR MONTE-CARLO CALCULATION OF WIENER INTEGRALS
VENTTSEL, AD
GLADYSHEV, SA
MILSHTEYN, GN
THEORY OF PROBABILITY AND ITS APPLICATIONS, 1985, 29 (04) : 744 - 752
[27] MULTIDIMENSIONAL MONTE-CARLO INTEGRATION BASED ON FACTORIZED APPROXIMATION FUNCTIONS
SASAKI, T
SIAM JOURNAL ON NUMERICAL ANALYSIS, 1978, 15 (05) : 938 - 952
[28] A MONTE-CARLO APPROXIMATION OF THE DISTRIBUTIONS OF THE MAXIMUM OF VARIOUS BROWNIAN BRIDGES
KULPERGER, RJ
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1988, 17 (12) : 4389 - 4397
[29] Monte-Carlo approximation for probability distribution of monotone Boolean function
Andronov, A
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2005, 132 (1-2) : 21 - 31
[30] Density scaling approximation for Monte-Carlo simulations of radioactive plumes
Siciliano, E. R.
Ely, J. H.
Stave, S. C.
NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2024, 1063

← 1 2 3 4 5 →