JASMINE: Arabic GPT Models for Few-Shot Learning

被引:0
|
作者
Nagoudi, El Moatez Billah [1 ,2 ]
Abdul-Mageed, Muhammad [1 ,2 ,3 ,4 ]
Elmadany, AbdelRahim [1 ,2 ]
Inciarte, Alcides Alcoba [1 ,2 ]
Khondaker, Md Tawkat Islam [1 ,2 ]
机构
[1] Univ British Columbia, Deep Learning, Vancouver, BC, Canada
[2] Univ British Columbia, Nat Language Proc Grp, Vancouver, BC, Canada
[3] MBZUAI, Dept Nat Language Proc, Abu Dhabi, U Arab Emirates
[4] MBZUAI, Dept Machine Learning, Abu Dhabi, U Arab Emirates
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scholarship on generative pretraining (GPT) remains acutely Anglocentric, leaving serious gaps in our understanding of the whole class of autoregressive models. For example, we have little knowledge about the potential of these models and their societal impacts in diverse linguistic and cultural settings. We alleviate this issue for Arabic, a wide collection of languages and dialectal varieties with similar to 450 million population, by introducing JASMINE. JASMINE is a suite of powerful Arabic autoregressive Transformer language models ranging in size between 300 million-6.7 billion parameters pretrained on a large and diverse dataset (similar to 235GB of text). We also carefully design and release a comprehensive benchmark for both automated and human evaluation of Arabic autoregressive models, with coverage of potential social biases, harms, and toxicity. Using our novel benchmark, we evaluate JASMINE extensively showing powerful performance intrinsically as well as in few-shot learning on a wide range of NLP tasks. We aim to responsibly release our models and evaluation benchmark with interested researchers, along with code for experimenting with them.
引用
收藏
页码:16721 / 16744
页数:24
相关论文
共 50 条
  • [31] Few-Shot Learning With a Strong Teacher
    Ye, Han-Jia
    Ming, Lu
    Zhan, De-Chuan
    Chao, Wei-Lun
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1425 - 1440
  • [32] Local Propagation for Few-Shot Learning
    Lifchitz, Yann
    Avrithis, Yannis
    Picard, Sylvaine
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10457 - 10464
  • [33] Few-Shot Learning With Class Imbalance
    Ochal M.
    Patacchiola M.
    Vazquez J.
    Storkey A.
    Wang S.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (05): : 1348 - 1358
  • [34] Explore pretraining for few-shot learning
    Yan Li
    Jinjie Huang
    Multimedia Tools and Applications, 2024, 83 : 4691 - 4702
  • [35] Few-Shot Learning for Defence and Security
    Robinson, Todd
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS II, 2020, 11413
  • [36] Few-Shot Learning for Image Denoising
    Jiang, Bo
    Lu, Yao
    Zhang, Bob
    Lu, Guangming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4741 - 4753
  • [37] Few-shot Learning with Noisy Labels
    Liang, Kevin J.
    Rangrej, Samrudhdhi B.
    Petrovic, Vladan
    Hassner, Tal
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9079 - 9088
  • [38] Few-shot Continual Infomax Learning
    Gu, Ziqi
    Xu, Chunyan
    Yang, Jian
    Cui, Zhen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19167 - 19176
  • [39] Adaptive Subspaces for Few-Shot Learning
    Simon, Christian
    Koniusz, Piotr
    Nock, Richard
    Harandi, Mehrtash
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4135 - 4144
  • [40] Exploring Quantization in Few-Shot Learning
    Wang, Meiqi
    Xue, Ruixin
    Lin, Jun
    Wang, Zhongfeng
    2020 18TH IEEE INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS'20), 2020, : 279 - 282