Apache Spark Streaming applications face a core resource management challenge: too few executors cause out- of-memory(OOM) failures while over-provisioning wastes com- putational resources. We present an online machine learning framework that predicts near-future memory needs using real- time executor-level metrics from Spark{\textquoteright}s internal memory man- agement system, including JVM heap and off-heap memory, execution and storage pools and garbage collection statistics. The framework scales executors between 4, 6 and 8 instances (each with 4GB RAM), applying a downscale cooldown to prevent premature scale-down decisions. We evaluate seven regression models in a simulation study designed as a controlled precursor to real-world deployment [1], under concept drift across 500 stream- ing batches. Bayesian Ridge reaches the best prediction quality (R2 = 0.91, MAE = 1,260 MB) with 90.2\% scaling accuracy and only 2.3\% OOM rate. A cooldown sensitivity analysis shows the trade-off between OOM reduction and resource efficiency, giving practical guidance for production deployment.
Kordelas, A & Deligiannis, N 2026, Online Machine Learning for Predictive Executor Memory Scaling in Apache Spark Streaming. in International Conference on Telecommunications (ICT) 2026. IEEE, pp. 93-98.
Kordelas, A., & Deligiannis, N. (Accepted/In press). Online Machine Learning for Predictive Executor Memory Scaling in Apache Spark Streaming. In International Conference on Telecommunications (ICT) 2026 (pp. 93-98). IEEE.
@inproceedings{6c182dfc248e4f07bb13061160d96949,
title = "Online Machine Learning for Predictive Executor Memory Scaling in Apache Spark Streaming",
abstract = "Apache Spark Streaming applications face a core resource management challenge: too few executors cause out- of-memory(OOM) failures while over-provisioning wastes com- putational resources. We present an online machine learning framework that predicts near-future memory needs using real- time executor-level metrics from Spark{\textquoteright}s internal memory man- agement system, including JVM heap and off-heap memory, execution and storage pools and garbage collection statistics. The framework scales executors between 4, 6 and 8 instances (each with 4GB RAM), applying a downscale cooldown to prevent premature scale-down decisions. We evaluate seven regression models in a simulation study designed as a controlled precursor to real-world deployment [1], under concept drift across 500 stream- ing batches. Bayesian Ridge reaches the best prediction quality (R2 = 0.91, MAE = 1,260 MB) with 90.2\% scaling accuracy and only 2.3\% OOM rate. A cooldown sensitivity analysis shows the trade-off between OOM reduction and resource efficiency, giving practical guidance for production deployment.",
author = "Athanasios Kordelas and Nikos Deligiannis",
year = "2026",
language = "English",
pages = "93--98",
booktitle = "International Conference on Telecommunications (ICT) 2026",
publisher = "IEEE",
}