A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression

被引:0
|
作者
de Loupy, Claude [1 ]
Guegan, Marie [1 ]
Ayache, Christelle [1 ]
Seng, Somara [1 ]
Moreno, Juan-Manuel Torres [2 ,3 ]
机构
[1] Syllabs, F-75013 Paris, France
[2] Lab Informat Avignon UAPV, F-84911 Avignon, France
[3] Ecole Polytech, Montreal, PQ H3C 3A7, Canada
关键词
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
This paper presents two corpora produced within the RPM2 project: a multi-document summarization corpus and a sentence compression corpus. Both corpora are in French. The first one is the only one we know in this language. It contains 20 topics with 20 documents each. A first set of 10 documents per topic is summarized and then the second set is used to produce an update summarization (new information). 4 annotators were involved and produced a total of 160 abstracts. The second corpus contains all the sentences of the first one. 4 annotators were asked to compress the 8432 sentences. This is the biggest corpus of compressed sentences we know, whatever the language. The paper provides some figures in order to compare the different annotators: compression rates, number of tokens per sentence, percentage of tokens kept according to their POS, position of dropped tokens in the sentence compression phase, etc. These figures show important differences from an annotator to the other. Another point is the different strategies of compression used according to the length of the sentence.
引用
收藏
页码:3113 / 3118
页数:6
相关论文
共 50 条
  • [1] SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression
    Zhao, Jinming
    Liu, Ming
    Gao, Longxiang
    Jin, Yuan
    Du, Lan
    Zhao, He
    Zhang, He
    Haffari, Gholamreza
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1949 - 1952
  • [2] MRS for multi-document summarization by sentence extraction
    Yong-Dong Xu
    Xiao-Dong Zhang
    Guang-Ri Quan
    Ya-Dong Wang
    Telecommunication Systems, 2013, 53 : 91 - 98
  • [3] MRS for multi-document summarization by sentence extraction
    Xu, Yong-Dong
    Zhang, Xiao-Dong
    Quan, Guang-Ri
    Wang, Ya-Dong
    TELECOMMUNICATION SYSTEMS, 2013, 53 (01) : 91 - 98
  • [4] Multi-document Summarization Based on Sentence Clustering
    Zheng, Hai-Tao
    Gong, Shu-Qin
    Chen, Hao
    Jiang, Yong
    Xia, Shu-Tao
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 429 - 436
  • [5] Multi-Document Summarization Using Sentence Clustering
    Gupta, Virendra Kumar
    Siddiqui, Tanveer J.
    4TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2012), 2012,
  • [6] Multi-Document Abstractive Summarization Using ILP Based Multi-Sentence Compression
    Banerjee, Siddhartha
    Mitra, Prasenjit
    Sugiyama, Kazunari
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 1208 - 1214
  • [7] Single-document and multi-document summarization techniques for email threads using sentence compression
    Zajic, David M.
    Dorr, Bonnie J.
    Lin, Jimmy
    INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (04) : 1600 - 1610
  • [8] Alignment-Based Sentence Position Policy in a News Corpus for Multi-document Summarization
    Asevedo Nobrega, Fernando Antonio
    Agostini, Veronica
    Camargo, Renata T.
    Di Felippo, Ariani
    Salgueiro Pardo, Thiago Alexandre
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 286 - 291
  • [9] A topic Approach to Sentence Ordering for Multi-document Summarization
    Na, Liu
    Peng, Xiao
    Ying, Lu
    Tang Xiao-jun
    Wang Hai-wen
    Li Ming-xia
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1390 - 1395
  • [10] Relevance of Sentence Features for Multi-document Text Summarization Using Human-Written Reference Summaries
    Neri Mendoza, Veronica
    Ledeneva, Yulia
    Arnulfo Garcia-Hernandez, Rene
    Hernandez Castaneda, Angel
    PATTERN RECOGNITION, MCPR 2024, 2024, 14755 : 319 - 330