More Diverse Training, Better Compositionality! Evidence from Multimodal Language Learning

被引：0

作者：

Volquardsen, Caspar ^{[1
]}

Lee, Jae Hee ^{[1
]}

Weber, Cornelius ^{[1
]}

Wermter, Stefan ^{[1
]}

机构：

[1] Univ Hamburg, Dept Informat, Knowledge Technol, Hamburg, Germany

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III | 2022年 / 13531卷

关键词：

Compositional generalization; Computer vision; Multimodality; Sequence-to-sequence; Robotics;

D O I：

10.1007/978-3-031-15934-3_35

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Artificial neural networks still fall short of human-level generalization and require a very large number of training examples to succeed. Model architectures that further improve generalization capabilities are therefore still an open research question. We created a multimodal dataset from simulation for measuring the compositional generalization of neural networks in multimodal language learning. The dataset consists of sequences showing a robot arm interacting with objects on a table in a simple 3D environment, with the goal of describing the interaction. Compositional object features, multiple actions, and distracting objects pose challenges to the model. We show that an LSTM-encoder-decoder architecture jointly trained together with a vision-encoder surpasses previous performance and handles multiple visible objects. Visualization of important input dimensions shows that a model that is trained with multiple objects, but not a model trained on just one object, has learnt to ignore irrelevant objects. Furthermore we show that additional modalities in the input improve the overall performance. We conclude that the underlying training data has a significant influence on the model's capability to generalize compositionally.

引用

页码：417 / 428

页数：12

共 50 条

[21] The Road to Language Learning Is Iconic: Evidence From British Sign Language
Thompson, Robin L.
Vinson, David P.
Woll, Bencie
Vigliocco, Gabriella
PSYCHOLOGICAL SCIENCE, 2012, 23 (12) : 1443 - 1448
[22] Is more engaging safety training always better in reducing accidents? Evidence of self-selection from Chilean panel data
Brahm, Francisco
Singer, Marcos
JOURNAL OF SAFETY RESEARCH, 2013, 47 : 85 - 92
[23] More is not always better: evidence from a randomised experiment of computer-assisted learning in rural minority schools in Qinghai
Lai, Fang
Zhang, Linxiu
Bai, Yu
Liu, Chengfang
Shi, Yaojiang
Chang, Fang
Rozelle, Scott
JOURNAL OF DEVELOPMENT EFFECTIVENESS, 2016, 8 (04) : 449 - 472
[24] Multimodal semantic quantity representations: further evidence from Korean sign language
Domahs, Frank
Klein, Elise
Moeller, Korbinian
Nuerk, Hans-Christoph
Yoon, Byung-Chen
Willmes, Klaus
FRONTIERS IN PSYCHOLOGY, 2012, 3
[25] Is having more prerequisite knowledge better for learning from productive failure?
Pee Li Leslie Toh
Manu Kapur
Instructional Science, 2017, 45 : 377 - 394
[26] More Versus Better: Learning From the Medtronic Valiant Navion Recall
Weissler, E. Hope
Roe, Matthew
Hammill, Bradley G.
Hughes, G. Chad
CIRCULATION-CARDIOVASCULAR INTERVENTIONS, 2022, 15 (07) : 630 - 631
[27] Is having more prerequisite knowledge better for learning from productive failure?
Toh, Pee Li Leslie
Kapur, Manu
INSTRUCTIONAL SCIENCE, 2017, 45 (03) : 377 - 394
[28] More technology, better learning resources, better learning? Lessons from adopting virtual microscopy in undergraduate medical education
Helle, Laura
Nivala, Markus
Kronqvist, Pauliina
ANATOMICAL SCIENCES EDUCATION, 2013, 6 (02) : 73 - 80
[29] Bilingual/ESL preservice teachers' heritage language and language identity: Evidence from multimodal literacy autobiographies
Park, Hannah
Zong, Jiaxuan
Polat, Nihat
Schallert, Diane L.
TEACHING AND TEACHER EDUCATION, 2024, 152
[30] Where to from here? Increasing language coverage while building a more diverse discipline
Kidd, Evan
Garcia, Rowena
FIRST LANGUAGE, 2022, 42 (06) : 837 - 851

← 1 2 3 4 5 →