In this paper we present microbenchmarks in OpenCL to measure the most important performance characteristics of GPUs. Microbenchmarks try to measure individual characteristics that influence the performance. First, performance, in operations or bytes per second, is measured with respect tothe occupancy and as such provides an occupancy roofline curve. The curve shows at which occupancy level peak performance is reached. Second, when considering the cycles per instruction of each compute unit, we measure the two most important characteristics of an instruction: its issue and completion latency. This is based on modeling each compute unit as a pipelinefor computations and a pipeline for the memory access. We also measure some specific characteristics: the influence of independent instructions within a kernel and thread divergence. We argue that these are the most important characteristics for understanding the performance and predicting performance. The results for several Nvidia and AMD GPUs are provided. A free java application containing the microbenchmarks is available on www.gpuperformance.org.
Lemeire, J, Cornelis, JG & Segers, L 2016, Microbenchmarks for GPU characteristics: the occupancy roofline and the pipeline model. in Proceedings of 24th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). IEEE, pp. 456-463, PDP 2016, Heraklion, Greece, 17/02/16.
Lemeire, J., Cornelis, J. G., & Segers, L. (2016). Microbenchmarks for GPU characteristics: the occupancy roofline and the pipeline model. In Proceedings of 24th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) (pp. 456-463). IEEE.
@inproceedings{742d75d333984cacbf640f89cbf142cd,
title = "Microbenchmarks for GPU characteristics: the occupancy roofline and the pipeline model",
abstract = "In this paper we present microbenchmarks in OpenCL to measure the most important performance characteristics of GPUs. Microbenchmarks try to measure individual characteristics that influence the performance. First, performance, in operations or bytes per second, is measured with respect tothe occupancy and as such provides an occupancy roofline curve. The curve shows at which occupancy level peak performance is reached. Second, when considering the cycles per instruction of each compute unit, we measure the two most important characteristics of an instruction: its issue and completion latency. This is based on modeling each compute unit as a pipelinefor computations and a pipeline for the memory access. We also measure some specific characteristics: the influence of independent instructions within a kernel and thread divergence. We argue that these are the most important characteristics for understanding the performance and predicting performance. The results for several Nvidia and AMD GPUs are provided. A free java application containing the microbenchmarks is available on www.gpuperformance.org.",
keywords = "microbenchmarks, GPU, OpenCL, Performance Analysis",
author = "Jan Lemeire and Cornelis, {Jan G.} and Laurent Segers",
year = "2016",
month = feb,
day = "18",
language = "English",
isbn = "978-1-4673-8775-0",
pages = "456--463",
booktitle = "Proceedings of 24th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)",
publisher = "IEEE",
note = "PDP 2016 ; Conference date: 17-02-2016 Through 19-02-2016",
}