SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation

被引:0
|
作者
Kasanishi, Tetsu [1 ]
Isonuma, Masaru [1 ]
Mori, Junichiro [1 ,2 ]
Sakata, Ichiro [1 ]
机构
[1] Univ Tokyo, Tokyo, Japan
[2] RIKEN Ctr Adv Intelligence Project, Wako, Saitama, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic literature review generation is one of the most challenging tasks in natural language processing. Although large language models have tackled literature review generation, the absence of large-scale datasets has been a stumbling block to the progress. We release SciReviewGen, consisting of over 10,000 literature reviews and 690,000 papers cited in the reviews. Based on the dataset, we evaluate recent transformer-based summarization models on the literature review generation task, including Fusion-in-Decoder (Izacard and Grave, 2021) extended for literature review generation. Human evaluation results show that some machine-generated summaries are comparable to human-written reviews, while revealing the challenges of automatic literature review generation such as hallucinations and a lack of detailed information. Our dataset and code are available at https://github.com/ tetsu9923/SciReviewGen.
引用
收藏
页码:6695 / 6708
页数:14
相关论文
共 50 条
  • [31] Dungeons and Data: A Large-Scale NetHack Dataset
    Hambro, Eric
    Raileanu, Roberta
    Rothermel, Danielle
    Mella, Vegard
    Rocktaschel, Tim
    Kuttler, Heinrich
    Murray, Naila
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [32] Performance Visualization for Large-Scale Computing Systems: A Literature Review
    Gao, Qin
    Zhang, Xuhui
    Rau, Pei-Luen Patrick
    Maciejewski, Anthony A.
    Siegel, Howard Jay
    HUMAN-COMPUTER INTERACTION: DESIGN AND DEVELOPMENT APPROACHES, PT I, 2011, 6761 : 450 - 460
  • [33] The use of process data in large-scale assessments: a literature review
    Anghel, Ella
    Khorramdel, Lale
    von Davier, Matthias
    LARGE-SCALE ASSESSMENTS IN EDUCATION, 2024, 12 (01)
  • [34] Practices for Large-Scale Agile Transformations: A Systematic Literature Review
    Trippensee, Lennard
    Remane, Gerrit
    DIGITAL INNOVATION AND ENTREPRENEURSHIP (AMCIS 2021), 2021,
  • [35] A review on the effect of large-scale PV generation on power systems
    Ding, Ming
    Wang, Weisheng
    Wang, Xiuli
    Song, Yunting
    Chen, Dezhi
    Sun, Ming
    Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2014, 34 (01): : 1 - 14
  • [36] Large-Scale Automatic Audiobook Creation
    Walsh, Brendan
    Hamilton, Mark
    Newby, Greg
    Wang, Xi
    Ruan, Serena
    Zhao, Sheng
    He, Lei
    Zhang, Shaofei
    Dettinger, Eric
    Freeman, William T.
    Weimer, Markus
    INTERSPEECH 2023, 2023, : 3675 - 3676
  • [37] AUTOMATIC GENERATION OF STOCHASTICALLY DOMINANT FAILURE MODE FOR LARGE-SCALE STRUCTURES.
    Murotsu, Yoshisada
    Matsuzaki, Satoshi
    Okada, Hiroo
    Nippon Kikai Gakkai Ronbunshu, A Hen/Transactions of the Japan Society of Mechanical Engineers, Part A, 1986, 52 (478): : 1634 - 1640
  • [38] Dispatch Strategy of Large-scale Wind Farm Automatic Generation Control System
    Zhang, X.
    Xu, Z. Y.
    Iqbal, M. J.
    Yang, Q. X.
    2014 IEEE PES GENERAL MEETING - CONFERENCE & EXPOSITION, 2014,
  • [39] SDFC dataset: a large-scale benchmark dataset for hyperspectral image classification
    Sun, Liwei
    Zhang, Junjie
    Li, Jia
    Wang, Yueming
    Zeng, Dan
    OPTICAL AND QUANTUM ELECTRONICS, 2023, 55 (02)
  • [40] The Blackbird Dataset: A Large-Scale Dataset for UAV Perception in Aggressive Flight
    Antonini, Amado
    Guerra, Winter
    Murali, Varun
    Sayre-McCord, Thomas
    Karaman, Sertac
    PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON EXPERIMENTAL ROBOTICS, 2020, 11 : 130 - 139