PMIndiaSum: Multilingual and Cross-lingual Headline Summarization for Languages in India

被引:0
|
作者
Urlanal, Ashok [1 ]
Chen, Pinzhen [2 ]
Zhao, Zheng [2 ]
Cohen, Shay B. [2 ]
Shrivastava, Manish [1 ]
Haddow, Barry [2 ]
机构
[1] IIIT Hyderabad, Hyderabad, India
[2] Univ Edinburgh, Edinburgh, Midlothian, Scotland
基金
英国科研创新办公室;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces PMIndiaSum, a multilingual and massively parallel summarization corpus focused on languages in India. Our corpus provides a training and testing ground for four language families, 14 languages, and the largest to date with 196 language pairs. We detail our construction workflow including data acquisition, processing, and quality assurance. Furthermore, we publish benchmarks for monolingual, cross-lingual, and multilingual summarization by fine-tuning, prompting, as well as translate-and-summarize. Experimental results confirm the crucial role of our data in aiding summarization between Indian languages. Our dataset is publicly available and can be freely modified and re-distributed.(1)
引用
收藏
页码:11606 / 11628
页数:23
相关论文
共 50 条
  • [1] Cross-lingual timeline summarization
    Cagliero, Luca
    La Quatra, Moreno
    Garza, Paolo
    Baralis, Elena
    2021 IEEE FOURTH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE 2021), 2021, : 45 - 53
  • [2] A Survey on Cross-Lingual Summarization
    Wang, Jiaan
    Meng, Fandong
    Zheng, Duo
    Liang, Yunlong
    Li, Zhixu
    Qu, Jianfeng
    Zhou, Jie
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1304 - 1323
  • [3] Cross-lingual and Multilingual CLIP
    Carlsson, Fredrik
    Eisen, Philipp
    Rekathati, Faton
    Sahlgren, Magnus
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6848 - 6854
  • [4] NCLS: Neural Cross-Lingual Summarization
    Zhu, Junnan
    Wang, Qian
    Wang, Yining
    Zhou, Yu
    Zhang, Jiajun
    Wang, Shaonan
    Zong, Chengqing
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3054 - 3064
  • [5] Review of Research on Cross-Lingual Summarization
    Zheng, Bofei
    Yun, Jing
    Liu, Limin
    Jiao, Lei
    Yuan, Jingshu
    Computer Engineering and Applications, 2023, 59 (13) : 49 - 60
  • [6] Evaluating Factuality in Cross-lingual Summarization
    Gao, Mingqi
    Wang, Wenqing
    Wan, Xiaojun
    Xu, Yuemei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 12415 - 12431
  • [7] Understanding Translationese in Cross-Lingual Summarization
    Wang, Jiaan
    Meng, Fandong
    Liang, Yunlong
    Zhang, Tingyi
    Xu, Jiarong
    Li, Zhixu
    Zhou, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3837 - 3849
  • [8] Towards Making the Most of Knowledge Across Languages for Multimodal Cross-Lingual Summarization
    Shi, Xiaorui
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 424 - 438
  • [9] Cross-Lingual Validation of Multilingual Wordnets
    Tufis, Dan
    Ion, Radu
    Barbu, Eduard
    Barbu, Verginica
    GWC 2004: SECOND INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS, 2003, : 332 - 340
  • [10] SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism
    Fatima, Mehwish
    Kolber, Tim
    Markert, Katja
    Strube, Michael
    NewSumm 2023 - Proceedings of the 4th New Frontiers in Summarization Workshop, Proceedings of EMNLP Workshop, 2023, : 24 - 40