A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression

被引：0

作者：

de Loupy, Claude ^{[1
]}

Guegan, Marie ^{[1
]}

Ayache, Christelle ^{[1
]}

Seng, Somara ^{[1
]}

Moreno, Juan-Manuel Torres ^{[2
,3
]}

机构：

[1] Syllabs, F-75013 Paris, France

[2] Lab Informat Avignon UAPV, F-84911 Avignon, France

[3] Ecole Polytech, Montreal, PQ H3C 3A7, Canada

来源：

LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2010年

关键词：

D O I：

暂无

中图分类号：

H [语言、文字];

学科分类号：

05 ;

摘要：

This paper presents two corpora produced within the RPM2 project: a multi-document summarization corpus and a sentence compression corpus. Both corpora are in French. The first one is the only one we know in this language. It contains 20 topics with 20 documents each. A first set of 10 documents per topic is summarized and then the second set is used to produce an update summarization (new information). 4 annotators were involved and produced a total of 160 abstracts. The second corpus contains all the sentences of the first one. 4 annotators were asked to compress the 8432 sentences. This is the biggest corpus of compressed sentences we know, whatever the language. The paper provides some figures in order to compare the different annotators: compression rates, number of tokens per sentence, percentage of tokens kept according to their POS, position of dropped tokens in the sentence compression phase, etc. These figures show important differences from an annotator to the other. Another point is the different strategies of compression used according to the length of the sentence.

引用

页码：3113 / 3118

页数：6

共 50 条

[31] A Sentence-Clustering Based Algorithm to Extracting Multi-document Summarization
Chen, Dinglei
Wang, Wei
PROCEEDINGS OF 2008 INTERNATIONAL COLLOQUIUM ON ARTIFICIAL INTELLIGENCE IN EDUCATION, 2008, : 93 - 97
[32] A Multi-Document Coverage Reward for RELAXed Multi-Document Summarization
Parnell, Jacob
Unanue, Inigo Jauregi
Piccardi, Massimo
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5112 - 5128
[33] A Bottom-up Approach to Sentence Ordering for Multi-document Summarization
Bollegala, Danushka
Okazaki, Naoaki
Ishizuka, Mitsuru
COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 385 - 392
[34] AMDS: Sentence Extraction Based Proficient Framework For Multi-Document Summarization
Balasubramanian, C.
Srinivasagan, K. G.
Duraiswamy, K.
PROCEEDINGS OF THE 2013 THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFTWARE ENGINEERING (ICAISE 2013), 2013, 37 : 87 - 94
[35] Multi-document summarization based on rhetorical structure: Sentence extraction and evaluation
Xu Yong-dong
Wang Xiao-long
Liu Tao
Xu Zhi-ming
2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 2074 - +
[36] Neural sentence fusion for diversity driven abstractive multi-document summarization
Fuad, Tanvir Ahmed
Nayeem, Mir Tafseer
Mahmud, Asif
Chali, Yllias
COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 216 - 230
[37] Multi-document summarization sentence ordering algorithm using semantic analysis
Ji, Min
Liao, Junbi
Lei, Jingfa
Yuan, Zhongfan
Advances in Information Sciences and Service Sciences, 2012, 4 (14): : 125 - 131
[38] Sentence Similarity Using Syntactic and Semantic Features for Multi-document Summarization
Anjaneyulu, M.
Sarma, S. S. V. N.
Reddy, P. Vijaya Pal
Chander, K. Prem
Nagaprasad, S.
INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 471 - 485
[39] MULTI-DOCUMENT VIDEO SUMMARIZATION
Wang, Feng
Merialdo, Bernard
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 1326 - 1329
[40] On redundancy in multi-document summarization
Calvo, Hiram
Carrillo-Mendoza, Pabel
Gelbukh, Alexander
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (05) : 3245 - 3255

← 1 2 3 4 5 →