Publication Details
Athanasios Kordelas, Thanasis Spyrou, Spyros Voulgaris, Vasileios Megalooikonomou, Nikos Deligiannis

2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-23)

Contribution To Book Anthology


Apache Spark is one of the most commonly used frameworks for Big Data processing. Research on the provided streaming dynamic resource allocation feature, has been shown that large data load fluctuations, for instance, in website traffic, have a negative impact on the automatic scaling. Research has also indicated that the lack of data load prediction, which aims at the identification of the expected data load increase on peak hours/days, is the root cause of the aforementioned issue. Hence, this paper proposes an enhanced solution, namely, KORDI (Knowledge-based Orchestrated Resource DIstribution), aiming at optimising the allocation of Spark resources on Streaming applications in real time with the use of SARIMAX model. The experimental evaluation proves that the proposed solution provides a cost reduction of 38% without affecting stability.