SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation

被引:0
|
作者
Kasanishi, Tetsu [1 ]
Isonuma, Masaru [1 ]
Mori, Junichiro [1 ,2 ]
Sakata, Ichiro [1 ]
机构
[1] Univ Tokyo, Tokyo, Japan
[2] RIKEN Ctr Adv Intelligence Project, Wako, Saitama, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic literature review generation is one of the most challenging tasks in natural language processing. Although large language models have tackled literature review generation, the absence of large-scale datasets has been a stumbling block to the progress. We release SciReviewGen, consisting of over 10,000 literature reviews and 690,000 papers cited in the reviews. Based on the dataset, we evaluate recent transformer-based summarization models on the literature review generation task, including Fusion-in-Decoder (Izacard and Grave, 2021) extended for literature review generation. Human evaluation results show that some machine-generated summaries are comparable to human-written reviews, while revealing the challenges of automatic literature review generation such as hallucinations and a lack of detailed information. Our dataset and code are available at https://github.com/ tetsu9923/SciReviewGen.
引用
收藏
页码:6695 / 6708
页数:14
相关论文
共 50 条
  • [41] Automatic Item Generation Unleashed: An Evaluation of a Large-Scale Deployment of Item Models
    Attali, Yigal
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PART I, 2018, 10947 : 17 - 29
  • [42] AUTOMATIC-GENERATION OF STOCHASTICALLY DOMINANT FAILURE MODES FOR LARGE-SCALE STRUCTURES
    MUROTSU, Y
    MATSUZAKI, S
    OKADA, H
    JSME INTERNATIONAL JOURNAL, 1987, 30 (260): : 234 - 241
  • [43] SDFC dataset: a large-scale benchmark dataset for hyperspectral image classification
    Liwei Sun
    Junjie Zhang
    Jia Li
    Yueming Wang
    Dan Zeng
    Optical and Quantum Electronics, 2023, 55
  • [44] A Large-Scale Dataset for Motivational Dialogue System : An Application of Natural Language Generation to Mental Health
    Saha, Tulika
    Chopra, Saraansh
    Saha, Sriparna
    Bhattacharyya, Pushpak
    Kumar, Pankaj
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [45] Fraud Detection Using Large-scale Imbalance Dataset
    Rubaidi, Zainab Saad
    Ben Ammar, Boulbaba
    Ben Aouicha, Mohamed
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2022, 31 (08)
  • [46] A large-scale dataset of in vivo pharmacology assay results
    Fiona M. I. Hunter
    Francis L. Atkinson
    A. Patrícia Bento
    Nicolas Bosc
    Anna Gaulton
    Anne Hersey
    Andrew R. Leach
    Scientific Data, 5
  • [47] A large-scale audit of dataset licensing and attribution in AI
    Longpre, Shayne
    Mahari, Robert
    Chen, Anthony
    Obeng-Marnu, Naana
    Sileo, Damien
    Brannon, William
    Muennighoff, Niklas
    Khazam, Nathan
    Kabbara, Jad
    Perisetla, Kartik
    Wu, Xinyi
    Shippole, Enrico
    Bollacker, Kurt
    Wu, Tongshuang
    Villa, Luis
    Pentland, Sandy
    Hooker, Sara
    NATURE MACHINE INTELLIGENCE, 2024, 6 (08) : 975 - 987
  • [48] KoDF: A Large-scale Korean DeepFake Detection Dataset
    Kwon, Patrick
    You, Jaeseong
    Nam, Gyuhyeon
    Park, Sungwoo
    Chae, Gyeongsu
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10724 - 10733
  • [49] Literature Review: Herbal Medicine Treatment after Large-Scale Disasters
    Takayama, Shin
    Kaneko, Soichiro
    Numata, Takehiro
    Kamiya, Tetsuharu
    Arita, Ryutaro
    Saito, Natsumi
    Kikuchi, Akiko
    Ohsawa, Minoru
    Kohayagawa, Yoshitaka
    Ishii, Tadashi
    AMERICAN JOURNAL OF CHINESE MEDICINE, 2017, 45 (07): : 1345 - 1364
  • [50] Large-Scale Assessment Tests and school effectiveness: A systematic literature review
    Frade-Martinez, Cristina
    Gamazo, Adriana
    Olmos-Miguelanez, Susana
    EDUCATION IN THE KNOWLEDGE SOCIETY, 2024, 25