Apache Spark is one of the most commonly usedframeworks for Big Data processing. Research on the providedstreaming dynamic resource allocation feature, has been shownthat large data load fluctuations, for instance, in website traffic,have a negative impact on the automatic scaling. Research hasalso indicated that the lack of data load prediction, whichaims at the identification of the expected data load increase onpeak hours/days, is the root cause of the aforementioned issue.Hence, this paper proposes an enhanced solution, namely, KORDI(Knowledge-based Orchestrated Resource DIstribution), aimingat optimising the allocation of Spark resources on Streamingapplications in real time with the use of SARIMAX model.The experimental evaluation proves that the proposed solutionprovides a cost reduction of 38% without affecting stability.
Kordelas, A, Spyrou, T, Voulgaris, S, Megalooikonomou, V & Deligiannis, N 2023, KORD-I: A Framework for Real-Time Performance and Cost Optimization of Apache Spark Streaming. in 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-23). pp. 1-3, 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Raleigh, North Carolina, United States, 23/04/23.
Kordelas, A., Spyrou, T., Voulgaris, S., Megalooikonomou, V., & Deligiannis, N. (Accepted/In press). KORD-I: A Framework for Real-Time Performance and Cost Optimization of Apache Spark Streaming. In 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-23) (pp. 1-3)
@inproceedings{b0c644ec746c4a52b778639afbf7c753,
title = "KORD-I: A Framework for Real-Time Performance and Cost Optimization of Apache Spark Streaming",
abstract = "Apache Spark is one of the most commonly usedframeworks for Big Data processing. Research on the providedstreaming dynamic resource allocation feature, has been shownthat large data load fluctuations, for instance, in website traffic,have a negative impact on the automatic scaling. Research hasalso indicated that the lack of data load prediction, whichaims at the identification of the expected data load increase onpeak hours/days, is the root cause of the aforementioned issue.Hence, this paper proposes an enhanced solution, namely, KORDI(Knowledge-based Orchestrated Resource DIstribution), aimingat optimising the allocation of Spark resources on Streamingapplications in real time with the use of SARIMAX model.The experimental evaluation proves that the proposed solutionprovides a cost reduction of 38% without affecting stability.",
author = "Athanasios Kordelas and Thanasis Spyrou and Spyros Voulgaris and Vasileios Megalooikonomou and Nikos Deligiannis",
year = "2023",
language = "English",
pages = "1--3",
booktitle = "2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-23)",
note = "2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) ; Conference date: 23-04-2023 Through 25-04-2023",
url = "https://ispass.org/ispass2023/",
}