Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI

被引:0
|
作者
Zhang, Liang [1 ]
Lin, Jionghao [2 ,3 ,4 ]
Sabatini, John [1 ]
Borchers, Conrad [3 ]
Weitekamp, Daniel [3 ]
Cao, Meng [3 ]
Hollander, John [5 ]
Hu, Xiangen [6 ]
Graesser, Arthur C. [1 ]
机构
[1] Univ Memphis, Inst Intelligent Syst, Memphis, TN 38152 USA
[2] Univ Hong Kong, Fac Educ, Hong Kong, Peoples R China
[3] Carnegie Mellon Univ, Human Comp Interact Inst, Pittsburgh, PA 15213 USA
[4] Monash Univ, Fac Informat Technol, Ctr Learning Analyt, Clayton, Vic 3800, Australia
[5] Arkansas State Univ, Jonesboro, AR 72401 USA
[6] Hong Kong Polytech Univ, Dept Appl Social Sci, Hong Kong, Peoples R China
关键词
Data augmentation; data sparsity; generative artificial intelligence (GenAI); intelligent tutoring system (ITS); learning performance data; FRAMEWORK;
D O I
10.1109/TLT.2025.3526582
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Learning performance data, such as correct or incorrect answers and problem-solving attempts in intelligent tutoring systems (ITSs), facilitate the assessment of knowledge mastery and the delivery of effective instructions. However, these data tend to be highly sparse (80%similar to 90% missing observations) in most real-world applications. This data sparsity presents challenges to using learner models to effectively predict learners' future performance and explore new hypotheses about learning. This article proposes a systematic framework for augmenting learning performance data to address data sparsity. First, learning performance data can be represented as a 3-D tensor with dimensions corresponding to learners, questions, and attempts, effectively capturing longitudinal knowledge states during learning. Second, a tensor factorization method is used to impute missing values in sparse tensors of collected learner data, thereby grounding the imputation on knowledge tracing (KT) tasks that predict missing performance values based on real observations. Third, data augmentation using generative artificial intelligence models, including generative adversarial network (GAN), specifically vanilla GANs and generative pretrained transformers (GPTs, specifically GPT-4o), generate data tailored to individual clusters of learning performance. We tested this systemic framework on adult literacy datasets from AutoTutor lessons developed for adult reading comprehension. We found that tensor factorization outperformed baseline KT techniques in tracing and predicting learning performance, demonstrating higher fidelity in data imputation, and the vanilla GAN-based augmentation demonstrated greater overall stability across varying sample sizes, whereas GPT-4o-based augmentation exhibited higher variability, with occasional cases showing closer fidelity to the original data distribution. This framework facilitates the effective augmentation of learning performance data, enabling controlled, cost-effective approach for the evaluation and optimization of ITS instructional designs in both online and offline environments prior to deployment, and supporting advanced educational data mining and learning analytics.
引用
收藏
页码:145 / 164
页数:20
相关论文
共 50 条
  • [1] Semantic Data Augmentation for Deep Learning Testing using Generative AI
    Missaoui, Sondess
    Gerasimou, Simos
    Matragkas, Nicholas
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1694 - 1698
  • [2] Data Encoding with Generative AI: Towards Improved Machine Learning Performance
    Saouabe, Abdelkrim
    Oualla, Hicham
    Mourtaji, Imad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (10) : 53 - 57
  • [3] Geometric Morphometric Data Augmentation Using Generative Computational Learning Algorithms
    Courtenay, Lloyd A.
    Gonzalez-Aguilera, Diego
    APPLIED SCIENCES-BASEL, 2020, 10 (24): : 1 - 25
  • [4] Data Augmentation for the Femoral Head Using Generative Deep Learning Models
    Won, Joon Hee
    Goh, Tae Sik
    Lee, Jung Sub
    Lim, Hee Chang
    TRANSACTIONS OF THE KOREAN SOCIETY OF MECHANICAL ENGINEERS B, 2025, 49 (02) : 109 - 119
  • [5] Label Distribution Learning with Data Augmentation using Generative Adversarial Networks
    Rong, Bin-Yuan
    Zhang, Heng-Ru
    Li, Gui-Lin
    Min, Fan
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 21 - 30
  • [6] PERFGEN: A Synthesis and Evaluation Framework for Performance Data using Generative AI
    Banday, Banooqa H.
    Islam, Tanzima Z.
    Marathe, Aniruddha
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 188 - 197
  • [7] Data augmentation and generative machine learning on the cloud platform
    Piyush Vyas
    Kaushik Muthusamy Ragothaman
    Akhilesh Chauhan
    Bhaskar Rimal
    International Journal of Information Technology, 2024, 16 (8) : 4833 - 4843
  • [8] Graph contrastive learning for recommendation with generative data augmentation
    Li, Xiaoge
    Wang, Yin
    Wang, Yihan
    An, Xiaochun
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [10] Deep Generative Models for Data Synthesis and Augmentation in Machine Learning
    Adavala, Kiran Mayee
    Vhatkar, Sangeeta
    Ruprah, Taranpreet Singh
    Bhatia, Sukhwinder Kaur
    Kumar, Vipin
    Sharma, Dharmendra
    Praveen, B. Shyam
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (03) : 1242 - 1249