Multilingual Argument Mining: Datasets and Analysis

被引:0
|
作者
Toledo-Ronen, Orith [1 ]
Orbach, Matan [1 ]
Bilu, Yonatan [1 ]
Spector, Artem [1 ]
Slonim, Noam [1 ]
机构
[1] IBM Res, Cambridge, MA 02142 USA
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020 | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The growing interest in argument mining and computational argumentation brings with it a plethora of Natural Language Understanding (NLU) tasks and corresponding datasets. However, as with many other NLU tasks, the dominant language is English, with resources in other languages being few and far between. In this work, we explore the potential of transfer learning using the multilingual BERT model to address argument mining tasks in non-English languages, based on English datasets and the use of machine translation. We show that such methods are well suited for classifying the stance of arguments and detecting evidence, but less so for assessing the quality of arguments, presumably because quality is harder to preserve under translation. In addition, focusing on the translate-train approach, we show how the choice of languages for translation, and the relations among them, affect the accuracy of the resultant model. Finally, to facilitate evaluation of transfer learning on argument mining tasks, we provide a human-generated dataset with more than 10k arguments in multiple languages, as well as machine translation of the English datasets.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] MULTILINGUAL ANALYSIS OF INTELLIGIBILITY CLASSIFICATION USING ENGLISH, KOREAN, AND TAMIL DYSARTHRIC SPEECH DATASETS
    Yeo, Eun Jung
    Kim, Sunhee
    Chung, Minhwa
    2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,
  • [22] lipidr: A Software Tool for Data Mining and Analysis of Lipidomics Datasets
    Mohamed, Ahmed
    Molendijk, Jeffrey
    Hill, Michelle M.
    JOURNAL OF PROTEOME RESEARCH, 2020, 19 (07) : 2890 - 2897
  • [23] Argument Mining on Twitter: A survey
    Schaefer, Robin
    Stede, Manfred
    IT-INFORMATION TECHNOLOGY, 2021, 63 (01): : 45 - 58
  • [24] Argument Mining and Social Debates
    Carstens, Lucas
    Toni, Francesca
    Evripidou, Valentinos
    COMPUTATIONAL MODELS OF ARGUMENT, 2014, 266 : 451 - 452
  • [25] Multilingual and hierarchical classification of large datasets of scientific publications
    Protasiewicz, Jaroslaw
    Stanislawek, Tomasz
    Dadas, Slawomir
    2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 1670 - 1675
  • [26] Argument Mining on Clinical Trials
    Mayer, Tobias
    Cabrio, Elena
    Lippi, Marco
    Torroni, Paolo
    Villata, Serena
    COMPUTATIONAL MODELS OF ARGUMENT (COMMA 2018), 2018, 305 : 137 - 148
  • [27] Multilingual Culture-Independent Word Analogy Datasets
    Ulcar, Matej
    Vaik, Kristiina
    Lindstrom, Jessica
    Dailidenaite, Milda
    Robnik-Sikonja, Marko
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4074 - 4080
  • [28] Mining the literature and large datasets
    Joshua LaBaer
    Nature Biotechnology, 2003, 21 : 976 - 977
  • [29] Massively Multilingual Pronunciation Mining with WikiPron
    Lee, Jackson L.
    Ashby, Lucas F. E.
    Garza, M. Elizabeth
    Lee-Sikka, Yeonju
    Miller, Sean
    Wong, Alan
    McCarthy, Arya D.
    Gorman, Kyle
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4223 - 4228
  • [30] Mining the literature and large datasets
    LaBaer, J
    NATURE BIOTECHNOLOGY, 2003, 21 (09) : 976 - 977