More Diverse Training, Better Compositionality! Evidence from Multimodal Language Learning

被引:0
|
作者
Volquardsen, Caspar [1 ]
Lee, Jae Hee [1 ]
Weber, Cornelius [1 ]
Wermter, Stefan [1 ]
机构
[1] Univ Hamburg, Dept Informat, Knowledge Technol, Hamburg, Germany
关键词
Compositional generalization; Computer vision; Multimodality; Sequence-to-sequence; Robotics;
D O I
10.1007/978-3-031-15934-3_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Artificial neural networks still fall short of human-level generalization and require a very large number of training examples to succeed. Model architectures that further improve generalization capabilities are therefore still an open research question. We created a multimodal dataset from simulation for measuring the compositional generalization of neural networks in multimodal language learning. The dataset consists of sequences showing a robot arm interacting with objects on a table in a simple 3D environment, with the goal of describing the interaction. Compositional object features, multiple actions, and distracting objects pose challenges to the model. We show that an LSTM-encoder-decoder architecture jointly trained together with a vision-encoder surpasses previous performance and handles multiple visible objects. Visualization of important input dimensions shows that a model that is trained with multiple objects, but not a model trained on just one object, has learnt to ignore irrelevant objects. Furthermore we show that additional modalities in the input improve the overall performance. We conclude that the underlying training data has a significant influence on the model's capability to generalize compositionally.
引用
收藏
页码:417 / 428
页数:12
相关论文
共 50 条
  • [21] The Road to Language Learning Is Iconic: Evidence From British Sign Language
    Thompson, Robin L.
    Vinson, David P.
    Woll, Bencie
    Vigliocco, Gabriella
    PSYCHOLOGICAL SCIENCE, 2012, 23 (12) : 1443 - 1448
  • [22] Is more engaging safety training always better in reducing accidents? Evidence of self-selection from Chilean panel data
    Brahm, Francisco
    Singer, Marcos
    JOURNAL OF SAFETY RESEARCH, 2013, 47 : 85 - 92
  • [23] More is not always better: evidence from a randomised experiment of computer-assisted learning in rural minority schools in Qinghai
    Lai, Fang
    Zhang, Linxiu
    Bai, Yu
    Liu, Chengfang
    Shi, Yaojiang
    Chang, Fang
    Rozelle, Scott
    JOURNAL OF DEVELOPMENT EFFECTIVENESS, 2016, 8 (04) : 449 - 472
  • [24] Multimodal semantic quantity representations: further evidence from Korean sign language
    Domahs, Frank
    Klein, Elise
    Moeller, Korbinian
    Nuerk, Hans-Christoph
    Yoon, Byung-Chen
    Willmes, Klaus
    FRONTIERS IN PSYCHOLOGY, 2012, 3
  • [25] Is having more prerequisite knowledge better for learning from productive failure?
    Pee Li Leslie Toh
    Manu Kapur
    Instructional Science, 2017, 45 : 377 - 394
  • [26] More Versus Better: Learning From the Medtronic Valiant Navion Recall
    Weissler, E. Hope
    Roe, Matthew
    Hammill, Bradley G.
    Hughes, G. Chad
    CIRCULATION-CARDIOVASCULAR INTERVENTIONS, 2022, 15 (07) : 630 - 631
  • [27] Is having more prerequisite knowledge better for learning from productive failure?
    Toh, Pee Li Leslie
    Kapur, Manu
    INSTRUCTIONAL SCIENCE, 2017, 45 (03) : 377 - 394
  • [28] More technology, better learning resources, better learning? Lessons from adopting virtual microscopy in undergraduate medical education
    Helle, Laura
    Nivala, Markus
    Kronqvist, Pauliina
    ANATOMICAL SCIENCES EDUCATION, 2013, 6 (02) : 73 - 80
  • [29] Bilingual/ESL preservice teachers' heritage language and language identity: Evidence from multimodal literacy autobiographies
    Park, Hannah
    Zong, Jiaxuan
    Polat, Nihat
    Schallert, Diane L.
    TEACHING AND TEACHER EDUCATION, 2024, 152
  • [30] Where to from here? Increasing language coverage while building a more diverse discipline
    Kidd, Evan
    Garcia, Rowena
    FIRST LANGUAGE, 2022, 42 (06) : 837 - 851