MOCODA: Model-based Counterfactual Data Augmentation

被引:0
|
作者
Pitis, Silviu [1 ,2 ]
Creager, Elliot [1 ,2 ]
Mandlekar, Ajay [3 ]
Garg, Animesh [1 ,2 ,3 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Vector Inst, Toronto, ON, Canada
[3] NVIDIA, Santa Clara, CA USA
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The number of states in a dynamic process is exponential in the number of objects, making reinforcement learning (RL) difficult in complex, multi-object domains. For agents to scale to the real world, they will need to react to and reason about unseen combinations of objects. We argue that the ability to recognize and use local factorization in transition dynamics is a key element in unlocking the power of multi-object reasoning. To this end, we show that (1) known local structure in the environment transitions is sufficient for an exponential reduction in the sample complexity of training a dynamics model, and (2) a locally factored dynamics model provably generalizes out-of-distribution to unseen states and actions. Knowing the local structure also allows us to predict which unseen states and actions this dynamics model will generalize to. We propose to leverage these observations in a novel Model-based Counterfactual Data Augmentation (MOCODA) framework. MOCODA applies a learned locally factored dynamics model to an augmented distribution of states and actions to generate counterfactual transitions for RL. MOCODA works with a broader set of local structures than prior work and allows for direct control over the augmented training distribution. We show that MOCODA enables RL agents to learn policies that generalize to unseen states and actions. We use MOCODA to train an offline RL agent to solve an out-of-distribution robotics manipulation task on which standard offline RL algorithms fail.(1)
引用
收藏
页数:14
相关论文
共 50 条
  • [31] AD-AUG: Adversarial Data Augmentation for Counterfactual Recommendation
    Wang, Yifan
    Qin, Yifang
    Han, Yu
    Yin, Mingyang
    Zhou, Jingren
    Yang, Hongxia
    Zhang, Ming
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT I, 2023, 13713 : 474 - 490
  • [32] MARITRAC: Maritime trajectory classification using object instance segmentation with model-based generated data augmentation
    d'Afflisio, Enrica
    Millefiori, Leonardo M.
    Braca, Paolo
    Guerriero, Marco
    2024 27TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, FUSION 2024, 2024,
  • [33] A DIABETES RISK PREDICTING METHOD WITH MULTI-STRATEGY COUNTERFACTUAL-BASED DATA AUGMENTATION
    Wang, Chen
    Liu, Yan-Yi
    Diao, Zhao-Shuo
    Tang, Jia-Wei
    Wen, Ying-You
    Yang, Xiao-Tao
    FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY, 2023, 31 (06)
  • [34] SCM4SR: Structural Causal Model-based Data Augmentation for Robust Session-based Recommendation
    Gupta, Muskan
    Gupta, Priyanka
    Narwariya, Jyoti
    Vig, Lovekesh
    Shroff, Gautam
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2609 - 2613
  • [35] Model-based clustering of longitudinal data
    McNicholas, Paul D.
    Murphy, T. Brendan
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2010, 38 (01): : 153 - 168
  • [36] Boosting for model-based data clustering
    Saffari, Amir
    Bischof, Horst
    PATTERN RECOGNITION, 2008, 5096 : 51 - 60
  • [37] Model-based biclustering of clickstream data
    Melnykov, Volodymyr
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 93 : 31 - 45
  • [38] Model-based clustering for longitudinal data
    De la Cruz-Mesia, Rolando
    Quintanab, Fernando A.
    Marshall, Guillermo
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (03) : 1441 - 1457
  • [39] Model-Based Clustering of Temporal Data
    El Assaad, Hani
    Same, Allou
    Govaert, Gerard
    Aknin, Patrice
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2013, 2013, 8131 : 9 - 16
  • [40] Model-based integration and interpretation of data
    Petersen, J
    2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 815 - 820