SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation

被引：0

作者：

Kasanishi, Tetsu ^{[1
]}

Isonuma, Masaru ^{[1
]}

Mori, Junichiro ^{[1
,2
]}

Sakata, Ichiro ^{[1
]}

机构：

[1] Univ Tokyo, Tokyo, Japan

[2] RIKEN Ctr Adv Intelligence Project, Wako, Saitama, Japan

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic literature review generation is one of the most challenging tasks in natural language processing. Although large language models have tackled literature review generation, the absence of large-scale datasets has been a stumbling block to the progress. We release SciReviewGen, consisting of over 10,000 literature reviews and 690,000 papers cited in the reviews. Based on the dataset, we evaluate recent transformer-based summarization models on the literature review generation task, including Fusion-in-Decoder (Izacard and Grave, 2021) extended for literature review generation. Human evaluation results show that some machine-generated summaries are comparable to human-written reviews, while revealing the challenges of automatic literature review generation such as hallucinations and a lack of detailed information. Our dataset and code are available at https://github.com/ tetsu9923/SciReviewGen.

引用

页码：6695 / 6708

页数：14

共 50 条

[1] A Large-Scale Dataset for Empathetic Response Generation
Welivita, Anuradha
Xie, Yubo
Pu, Pearl
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1251 - 1264
[2] RNSum: A Large-Scale Dataset for Automatic Release Note Generation via Commit Logs Summarization
Kamezawa, Hisashi
Nishida, Noriki
Shimizu, Nobuyuki
Miyazaki, Takashi
Nakayama, Hideki
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 8718 - 8735
[3] Generation and Analysis of a Large-Scale Urban Vehicular Mobility Dataset
Uppoor, Sandesh
Trullols-Cruces, Oscar
Fiore, Marco
Barcelo-Ordinas, Jose M.
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2014, 13 (05) : 1061 - 1075
[4] Introduction and Analysis of a Large-Scale Benchmark Automatic Vehicle Identification Dataset
He, Zhaocheng
Chen, Kaiying
Chen, Xinyu
Sun, Weiwei
INTERNATIONAL CONFERENCE ON TRANSPORTATION AND DEVELOPMENT 2018: CONNECTED AND AUTONOMOUS VEHICLES AND TRANSPORTATION SAFETY, 2018, : 35 - 43
[5] Large-Scale Ontology Matching: a Review of the Literature
Babalou, Samira
Kargar, Mohammad Javad
Davarpanah, Seyyed Hashem
2016 SECOND INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2016, : 158 - 165
[6] Varta: A Large-Scale Headline-Generation Dataset for Indic Languages
Aralikatte, Rahul
Cheng, Ziling
Doddapaneni, Sumanth
Cheung, Jackie Chi Kit
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3468 - 3492
[7] DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles
Segarra, Encarna
Ahuir, Vicent
Hurtado, Lluis-F
Angel Gonzalez, Jose
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5931 - 5943
[8] DMDD: A Large-Scale Dataset for Dataset Mentions Detection
Pan, Huitong
Zhang, Qi
Dragut, Eduard
Caragea, Cornelia
Latecki, Longin Jan
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1132 - 1146
[9] Large-scale RDF Dataset Slicing
Marx, Edgard
Shekarpour, Saeedeh
Auer, Soeren
Ngomo, Axel-Cyrille Ngonga
2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 228 - 235
[10] Euler Clustering on Large-scale Dataset
Wu, Jian-Sheng
Zheng, Wei-Shi
Lai, Jian-Huang
Suen, Ching Y.
IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (04) : 502 - 515

← 1 2 3 4 5 →