Multilingual Argument Mining: Datasets and Analysis

被引：0

作者：

Toledo-Ronen, Orith ^{[1
]}

Orbach, Matan ^{[1
]}

Bilu, Yonatan ^{[1
]}

Spector, Artem ^{[1
]}

Slonim, Noam ^{[1
]}

机构：

[1] IBM Res, Cambridge, MA 02142 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020 | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The growing interest in argument mining and computational argumentation brings with it a plethora of Natural Language Understanding (NLU) tasks and corresponding datasets. However, as with many other NLU tasks, the dominant language is English, with resources in other languages being few and far between. In this work, we explore the potential of transfer learning using the multilingual BERT model to address argument mining tasks in non-English languages, based on English datasets and the use of machine translation. We show that such methods are well suited for classifying the stance of arguments and detecting evidence, but less so for assessing the quality of arguments, presumably because quality is harder to preserve under translation. In addition, focusing on the translate-train approach, we show how the choice of languages for translation, and the relations among them, affect the accuracy of the resultant model. Finally, to facilitate evaluation of transfer learning on argument mining tasks, we provide a human-generated dataset with more than 10k arguments in multiple languages, as well as machine translation of the English datasets.

引用

页数：15

共 50 条

[21] MULTILINGUAL ANALYSIS OF INTELLIGIBILITY CLASSIFICATION USING ENGLISH, KOREAN, AND TAMIL DYSARTHRIC SPEECH DATASETS
Yeo, Eun Jung
Kim, Sunhee
Chung, Minhwa
2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,
[22] lipidr: A Software Tool for Data Mining and Analysis of Lipidomics Datasets
Mohamed, Ahmed
Molendijk, Jeffrey
Hill, Michelle M.
JOURNAL OF PROTEOME RESEARCH, 2020, 19 (07) : 2890 - 2897
[23] Argument Mining on Twitter: A survey
Schaefer, Robin
Stede, Manfred
IT-INFORMATION TECHNOLOGY, 2021, 63 (01): : 45 - 58
[24] Argument Mining and Social Debates
Carstens, Lucas
Toni, Francesca
Evripidou, Valentinos
COMPUTATIONAL MODELS OF ARGUMENT, 2014, 266 : 451 - 452
[25] Multilingual and hierarchical classification of large datasets of scientific publications
Protasiewicz, Jaroslaw
Stanislawek, Tomasz
Dadas, Slawomir
2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 1670 - 1675
[26] Argument Mining on Clinical Trials
Mayer, Tobias
Cabrio, Elena
Lippi, Marco
Torroni, Paolo
Villata, Serena
COMPUTATIONAL MODELS OF ARGUMENT (COMMA 2018), 2018, 305 : 137 - 148
[27] Multilingual Culture-Independent Word Analogy Datasets
Ulcar, Matej
Vaik, Kristiina
Lindstrom, Jessica
Dailidenaite, Milda
Robnik-Sikonja, Marko
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4074 - 4080
[28] Mining the literature and large datasets
Joshua LaBaer
Nature Biotechnology, 2003, 21 : 976 - 977
[29] Massively Multilingual Pronunciation Mining with WikiPron
Lee, Jackson L.
Ashby, Lucas F. E.
Garza, M. Elizabeth
Lee-Sikka, Yeonju
Miller, Sean
Wong, Alan
McCarthy, Arya D.
Gorman, Kyle
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4223 - 4228
[30] Mining the literature and large datasets
LaBaer, J
NATURE BIOTECHNOLOGY, 2003, 21 (09) : 976 - 977

← 1 2 3 4 5 →