Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI

被引:0
|
作者
Zhang, Liang [1 ]
Lin, Jionghao [2 ,3 ,4 ]
Sabatini, John [1 ]
Borchers, Conrad [3 ]
Weitekamp, Daniel [3 ]
Cao, Meng [3 ]
Hollander, John [5 ]
Hu, Xiangen [6 ]
Graesser, Arthur C. [1 ]
机构
[1] Univ Memphis, Inst Intelligent Syst, Memphis, TN 38152 USA
[2] Univ Hong Kong, Fac Educ, Hong Kong, Peoples R China
[3] Carnegie Mellon Univ, Human Comp Interact Inst, Pittsburgh, PA 15213 USA
[4] Monash Univ, Fac Informat Technol, Ctr Learning Analyt, Clayton, Vic 3800, Australia
[5] Arkansas State Univ, Jonesboro, AR 72401 USA
[6] Hong Kong Polytech Univ, Dept Appl Social Sci, Hong Kong, Peoples R China
关键词
Data augmentation; data sparsity; generative artificial intelligence (GenAI); intelligent tutoring system (ITS); learning performance data; FRAMEWORK;
D O I
10.1109/TLT.2025.3526582
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Learning performance data, such as correct or incorrect answers and problem-solving attempts in intelligent tutoring systems (ITSs), facilitate the assessment of knowledge mastery and the delivery of effective instructions. However, these data tend to be highly sparse (80%similar to 90% missing observations) in most real-world applications. This data sparsity presents challenges to using learner models to effectively predict learners' future performance and explore new hypotheses about learning. This article proposes a systematic framework for augmenting learning performance data to address data sparsity. First, learning performance data can be represented as a 3-D tensor with dimensions corresponding to learners, questions, and attempts, effectively capturing longitudinal knowledge states during learning. Second, a tensor factorization method is used to impute missing values in sparse tensors of collected learner data, thereby grounding the imputation on knowledge tracing (KT) tasks that predict missing performance values based on real observations. Third, data augmentation using generative artificial intelligence models, including generative adversarial network (GAN), specifically vanilla GANs and generative pretrained transformers (GPTs, specifically GPT-4o), generate data tailored to individual clusters of learning performance. We tested this systemic framework on adult literacy datasets from AutoTutor lessons developed for adult reading comprehension. We found that tensor factorization outperformed baseline KT techniques in tracing and predicting learning performance, demonstrating higher fidelity in data imputation, and the vanilla GAN-based augmentation demonstrated greater overall stability across varying sample sizes, whereas GPT-4o-based augmentation exhibited higher variability, with occasional cases showing closer fidelity to the original data distribution. This framework facilitates the effective augmentation of learning performance data, enabling controlled, cost-effective approach for the evaluation and optimization of ITS instructional designs in both online and offline environments prior to deployment, and supporting advanced educational data mining and learning analytics.
引用
收藏
页码:145 / 164
页数:20
相关论文
共 50 条
  • [31] Data augmentation using generative models for track intrusion detection
    Lee, Soohyung
    Kim, Beomseong
    Lee, Heesung
    SCIENCE PROGRESS, 2023, 106 (04)
  • [32] Efficient Approaches for Data Augmentation by Using Generative Adversarial Networks
    Saha, Pretom Kumar
    Logofatu, Doina
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2022, 2022, 1600 : 386 - 399
  • [33] Distributed Raman Spectrum Data Augmentation System Using Federated Learning with Deep Generative Models
    Kim, Yaeran
    Lee, Woonghee
    SENSORS, 2022, 22 (24)
  • [34] Imbalanced spectral data analysis using data augmentation based on the generative adversarial network
    Chung, Jihoon
    Zhang, Junru
    Saimon, Amirul Islam
    Liu, Yang
    Johnson, Blake N.
    Kong, Zhenyu
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [35] Data Augmentation Strategies for Human Activity Data Using Generative Adversarial Neural Networks
    Hoelzemann, Alexander
    Sorathiya, Nimish
    Van Laerhoven, Kristof
    2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2021, : 8 - 13
  • [36] Learning to Recommend from Sparse Data via Generative User Feedback
    Wang, Wenlin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 4436 - 4444
  • [37] Data augmentation with automated machine learning: approaches and performance comparison with classical data augmentation methods
    Mumuni, Alhassan
    Mumuni, Fuseini
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, : 4035 - 4085
  • [38] Generative AI for Data Science 101: Coding Without Learning to Code
    Bien, Jacob
    Mukherjee, Gourab
    JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION, 2025,
  • [39] Regularization of multidimensional sparse seismic data using Delaunay tessellation
    Yeeh, Zeu
    Song, Youngseok
    Byun, Joongmoo
    Seol, Soon-Jee
    Kim, Ki-Young
    JOURNAL OF APPLIED GEOPHYSICS, 2020, 174
  • [40] Approximate computation of multidimensional aggregates of sparse data using wavelets
    Vitter, JS
    Wang, M
    SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999: SIGMOD99: PROCEEDINGS OF THE 1999 ACM SIGMOD - INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 1999, : 193 - 204