More Diverse Training, Better Compositionality! Evidence from Multimodal Language Learning

被引:0
|
作者
Volquardsen, Caspar [1 ]
Lee, Jae Hee [1 ]
Weber, Cornelius [1 ]
Wermter, Stefan [1 ]
机构
[1] Univ Hamburg, Dept Informat, Knowledge Technol, Hamburg, Germany
关键词
Compositional generalization; Computer vision; Multimodality; Sequence-to-sequence; Robotics;
D O I
10.1007/978-3-031-15934-3_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Artificial neural networks still fall short of human-level generalization and require a very large number of training examples to succeed. Model architectures that further improve generalization capabilities are therefore still an open research question. We created a multimodal dataset from simulation for measuring the compositional generalization of neural networks in multimodal language learning. The dataset consists of sequences showing a robot arm interacting with objects on a table in a simple 3D environment, with the goal of describing the interaction. Compositional object features, multiple actions, and distracting objects pose challenges to the model. We show that an LSTM-encoder-decoder architecture jointly trained together with a vision-encoder surpasses previous performance and handles multiple visible objects. Visualization of important input dimensions shows that a model that is trained with multiple objects, but not a model trained on just one object, has learnt to ignore irrelevant objects. Furthermore we show that additional modalities in the input improve the overall performance. We conclude that the underlying training data has a significant influence on the model's capability to generalize compositionally.
引用
收藏
页码:417 / 428
页数:12
相关论文
共 50 条
  • [31] Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
    Sun, Yuchong
    Xue, Hongwei
    Song, Ruihua
    Liu, Bei
    Yang, Huan
    Fu, Jianlong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [32] Learning language-independent representations of verbs and adjectives from multimodal retrieval
    Hansen, Victor Petren Bach
    Sogaard, Anders
    2018 14TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY & INTERNET BASED SYSTEMS (SITIS), 2018, : 427 - 434
  • [33] Enhancing Multimodal Sentiment Analysis via Learning from Large Language Model
    Pang, Ning
    Wu, Wansen
    Hu, Yue
    Xu, Kai
    Yin, Quanjun
    Qin, Long
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
  • [34] More female, better corporate performance? Evidence from Chinese listed companies
    Zhang, Zhen
    Wu, Yifan
    He, Dongwei
    FINANCE RESEARCH LETTERS, 2024, 63
  • [35] Less Information, More Comparison, and Better Performance: Evidence from a Field Experiment
    Eyring, Henry
    Ferguson, Patrick J.
    Koppers, Sebastian
    JOURNAL OF ACCOUNTING RESEARCH, 2021, 59 (02) : 657 - 711
  • [36] More economic growth with the better public health? Evidence from Western China
    Zhao, Jing
    Zuo, Xiaoru
    Chang, Chun-Ping
    ECONOMIC CHANGE AND RESTRUCTURING, 2023, 56 (02) : 1083 - 1112
  • [37] More economic growth with the better public health? Evidence from Western China
    Jing Zhao
    Xiaoru Zuo
    Chun-Ping Chang
    Economic Change and Restructuring, 2023, 56 : 1083 - 1112
  • [38] FROM COGNITIVE-BEHAVIORAL TO EVIDENCE-BASED PSYCHOTHERAPIES: MORE AND BETTER
    David, Daniel
    Tatar, Aurora Szentagotai
    Cristea, Ioana-Alina
    JOURNAL OF EVIDENCE-BASED PSYCHOTHERAPIES, 2014, 14 (01): : 1 - 2
  • [39] Learning from Perturbations: Diverse and Informative Dialogue Generation with Inverse Adversarial Training
    Zhou, Wangchunshu
    Li, Qifei
    Li, Chenle
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 694 - 703
  • [40] Training on multimodal mobile-assisted language learning: a suggested model for pre-service EFL teachers*
    Gonen, Safiye Ipek Kuru
    Zeybek, Gulin
    COMPUTER ASSISTED LANGUAGE LEARNING, 2024, 37 (07) : 2202 - 2223