Span Selection Pre-training for Question Answering

被引:0
|
作者
Glass, Michael [1 ]
Gliozzo, Alfio [1 ]
Chakravarti, Rishav [1 ]
Ferritto, Anthony [1 ]
Pan, Lin [1 ]
Bhargav, G. P. Shrivatsa [2 ]
Garg, Dinesh [1 ]
Sil, Avirup [1 ]
机构
[1] IBM Res AI, Armonk, NY 10504 USA
[2] IISC, Dept CSA, Bangalore, Karnataka, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pre-trained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension to better align the pre-training from memorization to understanding. Span Selection Pre-Training (SSPT) poses doze-like training instances, but rather than draw the answer from the model's parameters, it is selected from a relevant passage. We find significant and consistent improvements over both BERTBASE and BERTLARGE on multiple Machine Reading Comprehension (MRC) datasets. Specifically, our proposed model has strong empirical evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC dataset, outperforming BERTLARGE by 3 F1 points on short answer prediction. We also show significant impact in HotpotQA, improving answer prediction F1 by 4 points and supporting fact prediction F1 by 1 point and outperforming the previous best system. Moreover, we show that our pre-training approach is particularly effective when training data is limited, improving the learning curve by a large amount.
引用
收藏
页码:2773 / 2782
页数:10
相关论文
共 50 条
  • [31] Context-Aware Transformer Pre-Training for Answer Sentence Selection
    Di Liello, Luca
    Garg, Siddhant
    Moschitti, Alessandro
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 458 - 468
  • [32] Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection
    Deutsch, Daniel
    Roth, Dan
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 575 - 588
  • [33] Multi-stage Pre-training over Simplified Multimodal Pre-training Models
    Liu, Tongtong
    Feng, Fangxiang
    Wang, Xiaojie
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2556 - 2565
  • [34] Smaller Can Be Better: Efficient Data Selection for Pre-training Models
    Fang, Guang
    Wang, Shihui
    Wang, Mingxin
    Yang, Yulan
    Huang, Hao
    WEB AND BIG DATA, APWEB-WAIM 2024, PT I, 2024, 14961 : 327 - 342
  • [35] Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks
    Dong, Haoyu
    Cheng, Zhoujun
    He, Xinyi
    Zhou, Mengyu
    Zhou, Anda
    Zhou, Fan
    Liu, Ao
    Han, Shi
    Zhang, Dongmei
    PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 5426 - 5435
  • [36] Ensemble ALBERT and RoBERTa for Span Prediction in Question Answering
    Bachina, Sony
    Balumuri, Spandana
    Kamath, Sowmya S.
    1ST WORKSHOP ON DOCUMENT-GROUNDED DIALOGUE AND CONVERSATIONAL QUESTION ANSWERING (DIALDOC 2021), 2021, : 63 - 68
  • [37] MultiSpanQA: A Dataset for Multi-Span Question Answering
    Li, Haonan
    Vasardani, Maria
    Tomko, Martin
    Baldwin, Timothy
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1250 - 1260
  • [38] Question Answering with Long Multiple-Span Answers
    Zhu, Ming
    Ahuja, Aman
    Juan, Da-Cheng
    Wei, Wei
    Reddy, Chandan K.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3840 - 3849
  • [39] Rethinking ImageNet Pre-training
    He, Kaiming
    Girshick, Ross
    Dollar, Piotr
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4917 - 4926
  • [40] Photo Pre-Training, But for Sketch
    Ke, L.
    Pang, Kaiyue
    Song, Yi-Zhe
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2754 - 2764