This work presents an in-depth study of the analytical models for the performance estimation of GPUs. We show that the models' analytical equations can be derived from a pipeline analogy that models each GPU subsystem as an abstract pipeline. We call this the Pipeline model. All the equations are reformulated based on generic pipeline characteristics, namely throughput and latency. Our analysis shows equivalences between models and reveals substantial problems with some of the equations. Rather than relying on equations, the Pipeline model is then used to simulate the behavior of kernel executions based on the same hardware parameters as the analytical models. The simplicity of the model and relying on simulation mean that this approach needs less assumptions, is more comprehensive and is more flexible. More performance aspects can be taken into consideration. The different models are compared and evaluated empirically with 14 kernels of the Rodinia benchmark suite with varying occupancy. The Pipeline model gives an average MAPE of 24, while the average MAPE values of the other models lie between 27 and 136.
Lemeire, J & Cornelis, JG 2023, 'Analysis of the Analytical Performance Models for GPUs and Extracting the Underlying Pipeline Model', Journal of Parallel and Distributed Computing, vol. 173, pp. 32-47. https://doi.org/10.1016/j.jpdc.2022.11.002
Lemeire, J., & Cornelis, J. G. (2023). Analysis of the Analytical Performance Models for GPUs and Extracting the Underlying Pipeline Model. Journal of Parallel and Distributed Computing, 173, 32-47. https://doi.org/10.1016/j.jpdc.2022.11.002
@article{a8033e7f4c8a4c2094924de1189e1d9d,
title = "Analysis of the Analytical Performance Models for GPUs and Extracting the Underlying Pipeline Model",
abstract = "This work presents an in-depth study of the analytical models for the performance estimation of GPUs. We show that the models' analytical equations can be derived from a pipeline analogy that models each GPU subsystem as an abstract pipeline. We call this the Pipeline model. All the equations are reformulated based on generic pipeline characteristics, namely throughput and latency. Our analysis shows equivalences between models and reveals substantial problems with some of the equations. Rather than relying on equations, the Pipeline model is then used to simulate the behavior of kernel executions based on the same hardware parameters as the analytical models. The simplicity of the model and relying on simulation mean that this approach needs less assumptions, is more comprehensive and is more flexible. More performance aspects can be taken into consideration. The different models are compared and evaluated empirically with 14 kernels of the Rodinia benchmark suite with varying occupancy. The Pipeline model gives an average MAPE of 24, while the average MAPE values of the other models lie between 27 and 136.",
keywords = "GPU Computing, GPU architecture, Analytical Performance Models",
author = "Jan Lemeire and Cornelis, {Jan G.}",
note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Inc.",
year = "2023",
month = mar,
doi = "10.1016/j.jpdc.2022.11.002",
language = "English",
volume = "173",
pages = "32--47",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",
}