Optimizing Query Processing for Scalable Educational Data Analytics: A Comprehensive Case Study with the OULAD Dataset
DOI:
https://doi.org/10.63075/7stb7t17Keywords:
Query Processing, Scalable Educational Data Analytics, OULAD DatasetAbstract
The rapid proliferation of e-learning platforms has ushered in an era of unprecedented data generation, creating opportunities for advanced predictive modeling and personalized educational experiences. The Open University Learning Analytics Dataset (OULAD), comprising 10.6 million virtual learning environment (VLE) interaction records, serves as a benchmark for educational data mining (EDM) but presents significant preprocessing challenges due to its scale. The baseline preprocessing pipeline for OULAD, while achieving 97% accuracy in course outcome prediction, requires 11.75 minutes and 12 GB of memory, limiting its scalability for real-time applications and resource-constrained environments. This study optimizes the pipeline by integrating parallel processing with Dask, index-based merging, selective feature processing, sparse encoding, and chunked resampling. The optimized pipeline runs in 4.5 minutes using only 7 GB of RAM through a Google Colab notebook with 16 GB RAM while sustaining 97% accuracy from the baseline. This research constructs a database analytics framework which joins classic optimization methods with intelligent agent techniques to enable educational data processing for offline and online learning systems.