Online Machine Learning for Predictive Executor Memory Scaling in Apache Spark Streaming

Online Machine Learning for Predictive Executor Memory Scaling in Apache Spark Streaming ■

Abstract ■

Apache Spark Streaming applications face a core resource management challenge: too few executors cause out- of-memory(OOM) failures while over-provisioning wastes com- putational resources. We present an online machine learning framework that predicts near-future memory needs using real- time executor-level metrics from Spark{\textquoteright}s internal memory man- agement system, including JVM heap and off-heap memory, execution and storage pools and garbage collection statistics. The framework scales executors between 4, 6 and 8 instances (each with 4GB RAM), applying a downscale cooldown to prevent premature scale-down decisions. We evaluate seven regression models in a simulation study designed as a controlled precursor to real-world deployment [1], under concept drift across 500 stream- ing batches. Bayesian Ridge reaches the best prediction quality (R2 = 0.91, MAE = 1,260 MB) with 90.2\% scaling accuracy and only 2.3\% OOM rate. A cooldown sensitivity analysis shows the trade-off between OOM reduction and resource efficiency, giving practical guidance for production deployment.