Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI

被引:0
|
作者
Zhang, Liang [1 ]
Lin, Jionghao [2 ,3 ,4 ]
Sabatini, John [1 ]
Borchers, Conrad [3 ]
Weitekamp, Daniel [3 ]
Cao, Meng [3 ]
Hollander, John [5 ]
Hu, Xiangen [6 ]
Graesser, Arthur C. [1 ]
机构
[1] Univ Memphis, Inst Intelligent Syst, Memphis, TN 38152 USA
[2] Univ Hong Kong, Fac Educ, Hong Kong, Peoples R China
[3] Carnegie Mellon Univ, Human Comp Interact Inst, Pittsburgh, PA 15213 USA
[4] Monash Univ, Fac Informat Technol, Ctr Learning Analyt, Clayton, Vic 3800, Australia
[5] Arkansas State Univ, Jonesboro, AR 72401 USA
[6] Hong Kong Polytech Univ, Dept Appl Social Sci, Hong Kong, Peoples R China
关键词
Data augmentation; data sparsity; generative artificial intelligence (GenAI); intelligent tutoring system (ITS); learning performance data; FRAMEWORK;
D O I
10.1109/TLT.2025.3526582
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Learning performance data, such as correct or incorrect answers and problem-solving attempts in intelligent tutoring systems (ITSs), facilitate the assessment of knowledge mastery and the delivery of effective instructions. However, these data tend to be highly sparse (80%similar to 90% missing observations) in most real-world applications. This data sparsity presents challenges to using learner models to effectively predict learners' future performance and explore new hypotheses about learning. This article proposes a systematic framework for augmenting learning performance data to address data sparsity. First, learning performance data can be represented as a 3-D tensor with dimensions corresponding to learners, questions, and attempts, effectively capturing longitudinal knowledge states during learning. Second, a tensor factorization method is used to impute missing values in sparse tensors of collected learner data, thereby grounding the imputation on knowledge tracing (KT) tasks that predict missing performance values based on real observations. Third, data augmentation using generative artificial intelligence models, including generative adversarial network (GAN), specifically vanilla GANs and generative pretrained transformers (GPTs, specifically GPT-4o), generate data tailored to individual clusters of learning performance. We tested this systemic framework on adult literacy datasets from AutoTutor lessons developed for adult reading comprehension. We found that tensor factorization outperformed baseline KT techniques in tracing and predicting learning performance, demonstrating higher fidelity in data imputation, and the vanilla GAN-based augmentation demonstrated greater overall stability across varying sample sizes, whereas GPT-4o-based augmentation exhibited higher variability, with occasional cases showing closer fidelity to the original data distribution. This framework facilitates the effective augmentation of learning performance data, enabling controlled, cost-effective approach for the evaluation and optimization of ITS instructional designs in both online and offline environments prior to deployment, and supporting advanced educational data mining and learning analytics.
引用
收藏
页码:145 / 164
页数:20
相关论文
共 50 条
  • [21] Biomedical Data Augmentation Using Generative Adversarial Neural Networks
    Calimeri, Francesco
    Marzullo, Aldo
    Stamile, Claudio
    Terracina, Giorgio
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 : 626 - 634
  • [22] SEQUENTIAL IOT DATA AUGMENTATION USING GENERATIVE ADVERSARIAL NETWORKS
    Tschuchnig, Maximilian Ernst
    Ferner, Cornelia
    Wegenkittl, Stefan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4212 - 4216
  • [23] A comprehensive survey for generative data augmentation
    Chen, Yunhao
    Yan, Zihui
    Zhu, Yunjie
    NEUROCOMPUTING, 2024, 600
  • [24] Generative Data Augmentation for Commonsense Reasoning
    Yang, Yiben
    Malaviya, Chaitanya
    Fernandez, Jared
    Swayamdipta, Swabha
    Le Bras, Ronan
    Wang, Ji-Ping
    Bhagavatula, Chandra
    Choi, Yejin
    Downe, Doug
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1008 - 1025
  • [25] AI4AVP: an antiviral peptides predictor in deep learning approach with generative adversarial network data augmentation
    Lin, Tzu-Tang
    Sun, Yih-Yun
    Wang, Ching-Tien
    Cheng, Wen-Chih
    Lu, I-Hsuan
    Lin, Chung-Yen
    Chen, Shu-Hwa
    Mulder, Nicola
    BIOINFORMATICS ADVANCES, 2022, 2 (01):
  • [26] Toward Understanding Generative Data Augmentation
    Zheng, Chenyu
    Wu, Guoqiang
    Li, Chongxuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [27] Generative Data Augmentation for Automatic Meter Reading Using CNNs
    Sripanuskul, Nuntida
    Buayai, Prawit
    Mao, Xiaoyang
    IEEE ACCESS, 2022, 10 : 28471 - 28486
  • [28] END TO END GENERATIVE META CURRICULUM LEARNING FOR MEDICAL DATA AUGMENTATION
    Li, Meng
    Li, Chaoyi
    Peng, Can
    Liu, Liangchen
    Lovell, Brian
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2155 - 2159
  • [29] Generative Data Augmentation of Human Biomechanics
    Karason, Halldor
    Ritrovato, Pierluigi
    Maffulli, Nicola
    Tortorella, Francesco
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2023 WORKSHOPS, PT I, 2024, 14365 : 482 - 493
  • [30] Data Augmentation for Voiceprint Recognition Using Generative Adversarial Networks
    Lin, Yao-San
    Chen, Hung-Yu
    Huang, Mei-Ling
    Hsieh, Tsung-Yu
    ALGORITHMS, 2024, 17 (12)