LoRaLay: A Multilingual and Multimodal Dataset for Long Range and Layout-Aware Summarization

被引:0
|
作者
Nguyen, Laura [1 ,3 ]
Scialom, Thomas [1 ,2 ]
Piwowarski, Benjamin [3 ]
Staiano, Jacopo [1 ,4 ]
机构
[1] reciTAL, Paris, France
[2] Meta AI, Paris, France
[3] Sorbonne Univ, CNRS, ISIR, F-75005 Paris, France
[4] Univ Trento, Trento, TN, Italy
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text Summarization is a popular task and an active area of research for the Natural Language Processing community. It requires accounting for long input texts, a characteristic which poses computational challenges for neural models. Moreover, real-world documents come in a variety of complex, visually-rich, layouts. This information is of great relevance, whether to highlight salient content or to encode long-range interactions between textual passages. Yet, all publicly available summarization datasets only provide plain text content. To facilitate research on how to exploit visual/layout information to better capture longrange dependencies in summarization models, we present LoRaLay, a collection of datasets for long-range summarization with accompanying visual/layout information. We extend existing and popular English datasets (arXiv and PubMed) with visual/layout information and propose four novel datasets - consistently built from scholar resources - covering French, Spanish, Portuguese, and Korean languages. Further, we propose new baselines merging layout-aware and long-range models - two orthogonal approaches - and obtain state-of-theart results, showing the importance of combining both lines of research.
引用
收藏
页码:636 / 651
页数:16
相关论文
共 5 条
  • [1] DocLLM: A Layout-Aware Generative Language Model for Multimodal Document Understanding
    Wang, Dongsheng
    Raman, Natraj
    Sibue, Mathieu
    Ma, Zhiqiang
    Babkin, Petr
    Kaur, Simerjot
    Pei, Yulong
    Nourbakhsh, Armineh
    Liu, Xiaomo
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8529 - 8548
  • [2] Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration
    Zhang, Zhenyu
    Yu, Bowen
    Yu, Haiyang
    Liu, Tingwen
    Fu, Cheng
    Li, Jingyang
    Tang, Chengguang
    Sun, Jian
    Li, Yongbin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7252 - 7260
  • [3] XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding
    Gu, Zhangxuan
    Meng, Changhua
    Wang, Ke
    Lan, Jun
    Wang, Weiqiang
    Gu, Ming
    Zhang, Liqing
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4573 - 4582
  • [4] TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records
    Gigant, Theo
    Dufaux, Frederic
    Guinaudeau, Camille
    Decombas, Marc
    20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023, 2023, : 61 - 70
  • [5] Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset
    Das, Sarmistha
    Ghosh, Samrat
    Tiwari, Abhisek
    Lynghoi, R. E. Zera Marveen
    Saha, Sriparna
    Murad, Zak
    Maurya, Alka
    IEEE ACCESS, 2025, 13 : 48367 - 48379