Enhancing Transformer with Horizontal and Vertical Guiding Mechanisms for Neural Language Modeling

被引：0

作者：

Qu, Anlin ^{[1
,2
]}

Niu, Jianwei ^{[1
,2
,3
,4
]}

Mo, Shasha ^{[1
,2
,5
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China

[2] Beihang Univ, Sch Comp Sci & Engn, Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China

[3] Beihang Univ, Hangzhou Innovat Res Inst, Hangzhou 310051, Peoples R China

[4] Zhengzhou Univ, Res Inst Ind Technol, Zhengzhou 450001, Peoples R China

[5] Beihang Univ, Sch Cyber Sci & Technol, Beijing 100191, Peoples R China

来源：

IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

neural language modeling; transformer; attention mechanism; information guiding;

D O I：

10.1109/ICC42927.2021.9500450

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

Language modeling is an important problem in Natural Language Processing (NLP), and the multi-layer Transformer network is currently the most advanced and effective model for this task. However, there exist two inherent defects in its multi-head self-attention structure: (1) attention information loss: the lower-level attention weights cannot be explicitly passed through upper layers, which may lead the network lose some pivotal attention information captured by lower-level layers; (2) multi-head bottleneck: the dimension of each head in vanilla Transformer is relatively small and the process of each head is independent, which introduces an expressive bottleneck and makes subspace learning inadequate constitutionally. To overcome these two weaknesses, a novel neural architecture named Guide-Transformer is proposed in this paper. The Guide-Transformer utilizes horizontal and vertical attention information to guide the original process of the multi-head self-attention sublayer without introducing excessive complexity. The experimental results on three authoritative language modeling benchmarks demonstrate the effectiveness of Guide-Transformer. For the popular perplexity (ppl) and bits-per-character (bpc) evaluation metrics, Guide-Transformer achieves moderate improvements over the powerful baseline model.

引用

页数：6

共 50 条

[1] LANGUAGE MODELING WITH TRANSFORMER
Zhang, Jian Guo
Li, Jian Ping
Li, Huang
2019 16TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICWAMTIP), 2019, : 249 - 253
[2] Vertical and horizontal transmission in language evolution
Wang, WSY
Minett, JW
TRANSACTIONS OF THE PHILOLOGICAL SOCIETY, 2005, 103 (02) : 121 - 146
[3] A Tensorized Transformer for Language Modeling
Ma, Xindian
Zhang, Peng
Zhang, Shuai
Duan, Nan
Hou, Yuexian
Song, Dawei
Zhou, Ming
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] Enhancing Vertical Efficiency Through Horizontal Licensing
Anil Arya
Brian Mittendorf
Journal of Regulatory Economics, 2006, 29 : 333 - 342
[5] Enhancing vertical efficiency through horizontal licensing
Arya, A
Mittendorf, B
JOURNAL OF REGULATORY ECONOMICS, 2006, 29 (03) : 333 - 342
[6] HORIZONTAL AND VERTICAL PATHWAYS IN NEURAL INDUCTION
GUTHRIE, S
TRENDS IN NEUROSCIENCES, 1991, 14 (04) : 123 - 126
[7] BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks
Oh, Jong-Hoon
Iida, Ryu
Kloetzer, Julien
Torisawa, Kentaro
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2103 - 2115
[8] Horizontal and vertical prism adaptation are different mechanisms
Brautaset, RL
Jennings, JAM
OPHTHALMIC AND PHYSIOLOGICAL OPTICS, 2005, 25 (03) : 215 - 218
[9] Horizontal Power, Vertical Weakness: Enhancing the "Circuit of Culture"
Champ, Joseph G.
POPULAR COMMUNICATION, 2008, 6 (02) : 85 - 102
[10] Horizontal and Vertical Determination of Mental and Neural States
Harbecke, Jens
Atmanspacher, Harald
JOURNAL OF THEORETICAL AND PHILOSOPHICAL PSYCHOLOGY, 2012, 32 (03): : 161 - 179

← 1 2 3 4 5 →