A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression

被引：0

作者：

de Loupy, Claude ^{[1
]}

Guegan, Marie ^{[1
]}

Ayache, Christelle ^{[1
]}

Seng, Somara ^{[1
]}

Moreno, Juan-Manuel Torres ^{[2
,3
]}

机构：

[1] Syllabs, F-75013 Paris, France

[2] Lab Informat Avignon UAPV, F-84911 Avignon, France

[3] Ecole Polytech, Montreal, PQ H3C 3A7, Canada

来源：

LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2010年

关键词：

D O I：

暂无

中图分类号：

H [语言、文字];

学科分类号：

05 ;

摘要：

This paper presents two corpora produced within the RPM2 project: a multi-document summarization corpus and a sentence compression corpus. Both corpora are in French. The first one is the only one we know in this language. It contains 20 topics with 20 documents each. A first set of 10 documents per topic is summarized and then the second set is used to produce an update summarization (new information). 4 annotators were involved and produced a total of 160 abstracts. The second corpus contains all the sentences of the first one. 4 annotators were asked to compress the 8432 sentences. This is the biggest corpus of compressed sentences we know, whatever the language. The paper provides some figures in order to compare the different annotators: compression rates, number of tokens per sentence, percentage of tokens kept according to their POS, position of dropped tokens in the sentence compression phase, etc. These figures show important differences from an annotator to the other. Another point is the different strategies of compression used according to the length of the sentence.

引用

页码：3113 / 3118

页数：6

共 50 条

[21] Cohesion-based Sentence Ordering for Multi-document Summarization
Jiang, Xiaoyu
2016 INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING AND COMMUNICATIONS TECHNOLOGY (IECT 2016), 2016, : 78 - 83
[22] Sentence extraction using time features in multi-document summarization
Lim, JM
Kang, IS
Bae, JHJ
Lee, JH
INFORMATION RETRIEVAL TECHNOLOGY, 2005, 3411 : 82 - 93
[23] Extractive multi-document summarization based on textual entailment and sentence compression via knapsack problem
Naserasadi, Ali
Khosravi, Hamid
Sadeghi, Faramarz
NATURAL LANGUAGE ENGINEERING, 2019, 25 (01) : 121 - 146
[24] Subtopic-focused sentence scoring in multi-document summarization
Li Sujian
Qu Weiguang
ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 98 - +
[25] Experimentation of Two Compression Strategies for Multi-Document Summarization
Fatma, Jaoua Kallel
Jaoua, Maher
Belguith, Lamia Hadrich
Ben Hamadou, Abdelmajid
SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, VOL 2, PROCEEDINGS, 2009, : 480 - +
[26] LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
Friedrich, Annemarie
Valeeva, Marina
Palmer, Alexis
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1591 - 1599
[27] Sentence Similarity based on Dependency Tree Kernels for Multi-document Summarization
Ozates, Saziye Betul
Ozgur, Arzucan
Radev, Dragomir R.
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2833 - 2838
[28] A bottom-up approach to sentence ordering for multi-document summarization
Bollegala, Danushka
Okazaki, Naoaki
Ishizuka, Mitsuru
INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (01) : 89 - 109
[29] Multi-Document Summarization using Sentence Fusion for Indonesian News Articles
Christie, Felicia
Khodra, Masayu Leylia
2016 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS - CONCEPTS, THEORY AND APPLICATION (ICAICTA), 2016,
[30] TWO-STAGE SENTENCE SELECTION APPROACH FOR MULTI-DOCUMENT SUMMARIZATION
Zhang Shu Zhao Tiejun Zheng Dequan Zhao Hua (Department of Computer Science and Technology
Journal of Electronics(China), 2008, (04) : 562 - 567

← 1 2 3 4 5 →