Optimizing Query Processing for Scalable Educational Data Analytics: A Comprehensive Case Study with the OULAD Dataset

Authors

  • Shahzadi Sammar Rasheed Author
  • Muhammad Arif Author
  • Hassan Nawaz Author
  • Muhammad Dahir Zeb Author

DOI:

https://doi.org/10.63075/7stb7t17

Keywords:

Query Processing, Scalable Educational Data Analytics, OULAD Dataset

Abstract

The rapid proliferation of e-learning platforms has ushered in an era of unprecedented data generation, creating opportunities for advanced predictive modeling and personalized educational experiences. The Open University Learning Analytics Dataset (OULAD), comprising 10.6 million virtual learning environment (VLE) interaction records, serves as a benchmark for educational data mining (EDM) but presents significant preprocessing challenges due to its scale. The baseline preprocessing pipeline for OULAD, while achieving 97% accuracy in course outcome prediction, requires 11.75 minutes and 12 GB of memory, limiting its scalability for real-time applications and resource-constrained environments. This study optimizes the pipeline by integrating parallel processing with Dask, index-based merging, selective feature processing, sparse encoding, and chunked resampling. The optimized pipeline runs in 4.5 minutes using only 7 GB of RAM through a Google Colab notebook with 16 GB RAM while sustaining 97% accuracy from the baseline. This research constructs a database analytics framework which joins classic optimization methods with intelligent agent techniques to enable educational data processing for offline and online learning systems.

Downloads

Download data is not yet available.

Downloads

Published

2025-04-26

How to Cite

Optimizing Query Processing for Scalable Educational Data Analytics: A Comprehensive Case Study with the OULAD Dataset. (2025). Annual Methodological Archive Research Review, 3(4), 495-510. https://doi.org/10.63075/7stb7t17

Similar Articles

21-30 of 80

You may also start an advanced similarity search for this article.