More Diverse Training, Better Compositionality! Evidence from Multimodal Language Learning

被引:0
|
作者
Volquardsen, Caspar [1 ]
Lee, Jae Hee [1 ]
Weber, Cornelius [1 ]
Wermter, Stefan [1 ]
机构
[1] Univ Hamburg, Dept Informat, Knowledge Technol, Hamburg, Germany
关键词
Compositional generalization; Computer vision; Multimodality; Sequence-to-sequence; Robotics;
D O I
10.1007/978-3-031-15934-3_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Artificial neural networks still fall short of human-level generalization and require a very large number of training examples to succeed. Model architectures that further improve generalization capabilities are therefore still an open research question. We created a multimodal dataset from simulation for measuring the compositional generalization of neural networks in multimodal language learning. The dataset consists of sequences showing a robot arm interacting with objects on a table in a simple 3D environment, with the goal of describing the interaction. Compositional object features, multiple actions, and distracting objects pose challenges to the model. We show that an LSTM-encoder-decoder architecture jointly trained together with a vision-encoder surpasses previous performance and handles multiple visible objects. Visualization of important input dimensions shows that a model that is trained with multiple objects, but not a model trained on just one object, has learnt to ignore irrelevant objects. Furthermore we show that additional modalities in the input improve the overall performance. We conclude that the underlying training data has a significant influence on the model's capability to generalize compositionally.
引用
收藏
页码:417 / 428
页数:12
相关论文
共 50 条
  • [41] Typological Universals as Reflections of Biased Learning: Evidence from Artificial Language Learning
    Culbertson, Jennifer
    LANGUAGE AND LINGUISTICS COMPASS, 2012, 6 (05): : 310 - 329
  • [42] Better learning from crises with evidence-based evaluation strategies
    Kuipers, E. H. C.
    de Haag, P. A. M. Uijt
    Manuel, H. J.
    Aarts, H. J. M.
    Schol, L. G. C.
    van Steenbergen, J. E.
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2016, 26 : 361 - 361
  • [43] Color categories are diverse in thought as well as language: evidence from New Guinea and Africa
    Roberson, DD
    COLOR IMAGING IX: PROCESSING, HARDCOPY, AND APPLICATIONS, 2004, 5293 : 1 - 10
  • [44] The more quality information the better: Hierarchical generation of multi-evidence alignment and fusion model for multimodal entity and relation extraction
    He, Xinyu
    Li, Shixin
    Zhang, Yuning
    Li, Binhe
    Xu, Sifan
    Zhou, Yuqing
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
  • [45] Relationality and Ojibwemowin† in Forest Walks: Learning from Multimodal Interaction about Land and Language
    Hermes, Mary Rose
    Engman, Mel M.
    Meixi
    McKenzie, James
    COGNITION AND INSTRUCTION, 2023, 41 (01) : 1 - 31
  • [46] Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data
    Liang, Paul Pu
    Liu, Terrance
    Cai, Anna
    Muszynski, Michal
    Ishii, Ryo
    Allen, Nicholas
    Auerbach, Randy
    Brent, David
    Salakhutdinov, Ruslan
    Morency, Louis-Philippe
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4170 - 4187
  • [47] Becoming a Language Learning Advisor: Insights from a Training Program in Brazil
    Magno E Silva, Walkyria
    Castro, Eduardo
    STUDIES IN SELF-ACCESS LEARNING JOURNAL, 2018, 9 (04): : 415 - 424
  • [48] Personalized learning: From neurogenetics of behaviors to designing optimal language training
    Wong, Patrick C. M.
    Vuong, Loan C.
    Liu, Kevin
    NEUROPSYCHOLOGIA, 2017, 98 : 192 - 200
  • [49] Learning Myelin Content in Multiple Sclerosis from Multimodal MRI Through Adversarial Training
    Wei, Wen
    Poirion, Emilie
    Bodini, Benedetta
    Durrleman, Stanley
    Ayache, Nicholas
    Stankoff, Bruno
    Colliot, Olivier
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, PT III, 2018, 11072 : 514 - 522
  • [50] Does More Education Always Lead to Better Health? Evidence from Rural Malaysia
    Leeves, Gareth
    Soyiri, Ireneous
    BIOMED RESEARCH INTERNATIONAL, 2015, 2015