A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression

被引:0
|
作者
de Loupy, Claude [1 ]
Guegan, Marie [1 ]
Ayache, Christelle [1 ]
Seng, Somara [1 ]
Moreno, Juan-Manuel Torres [2 ,3 ]
机构
[1] Syllabs, F-75013 Paris, France
[2] Lab Informat Avignon UAPV, F-84911 Avignon, France
[3] Ecole Polytech, Montreal, PQ H3C 3A7, Canada
关键词
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
This paper presents two corpora produced within the RPM2 project: a multi-document summarization corpus and a sentence compression corpus. Both corpora are in French. The first one is the only one we know in this language. It contains 20 topics with 20 documents each. A first set of 10 documents per topic is summarized and then the second set is used to produce an update summarization (new information). 4 annotators were involved and produced a total of 160 abstracts. The second corpus contains all the sentences of the first one. 4 annotators were asked to compress the 8432 sentences. This is the biggest corpus of compressed sentences we know, whatever the language. The paper provides some figures in order to compare the different annotators: compression rates, number of tokens per sentence, percentage of tokens kept according to their POS, position of dropped tokens in the sentence compression phase, etc. These figures show important differences from an annotator to the other. Another point is the different strategies of compression used according to the length of the sentence.
引用
收藏
页码:3113 / 3118
页数:6
相关论文
共 50 条
  • [31] A Sentence-Clustering Based Algorithm to Extracting Multi-document Summarization
    Chen, Dinglei
    Wang, Wei
    PROCEEDINGS OF 2008 INTERNATIONAL COLLOQUIUM ON ARTIFICIAL INTELLIGENCE IN EDUCATION, 2008, : 93 - 97
  • [32] A Multi-Document Coverage Reward for RELAXed Multi-Document Summarization
    Parnell, Jacob
    Unanue, Inigo Jauregi
    Piccardi, Massimo
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5112 - 5128
  • [33] A Bottom-up Approach to Sentence Ordering for Multi-document Summarization
    Bollegala, Danushka
    Okazaki, Naoaki
    Ishizuka, Mitsuru
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 385 - 392
  • [34] AMDS: Sentence Extraction Based Proficient Framework For Multi-Document Summarization
    Balasubramanian, C.
    Srinivasagan, K. G.
    Duraiswamy, K.
    PROCEEDINGS OF THE 2013 THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFTWARE ENGINEERING (ICAISE 2013), 2013, 37 : 87 - 94
  • [35] Multi-document summarization based on rhetorical structure: Sentence extraction and evaluation
    Xu Yong-dong
    Wang Xiao-long
    Liu Tao
    Xu Zhi-ming
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 2074 - +
  • [36] Neural sentence fusion for diversity driven abstractive multi-document summarization
    Fuad, Tanvir Ahmed
    Nayeem, Mir Tafseer
    Mahmud, Asif
    Chali, Yllias
    COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 216 - 230
  • [37] Multi-document summarization sentence ordering algorithm using semantic analysis
    Ji, Min
    Liao, Junbi
    Lei, Jingfa
    Yuan, Zhongfan
    Advances in Information Sciences and Service Sciences, 2012, 4 (14): : 125 - 131
  • [38] Sentence Similarity Using Syntactic and Semantic Features for Multi-document Summarization
    Anjaneyulu, M.
    Sarma, S. S. V. N.
    Reddy, P. Vijaya Pal
    Chander, K. Prem
    Nagaprasad, S.
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 471 - 485
  • [39] MULTI-DOCUMENT VIDEO SUMMARIZATION
    Wang, Feng
    Merialdo, Bernard
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 1326 - 1329
  • [40] On redundancy in multi-document summarization
    Calvo, Hiram
    Carrillo-Mendoza, Pabel
    Gelbukh, Alexander
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (05) : 3245 - 3255