This paper discusses an OpenCL version of a volumetric JPEG 2000 codec that runs on GPUs, multi-core processors or a combination of both. Since the performance critical part consists of a fine-grained (discrete wavelet transform) and coarse-grained algorithm (Tier-1), the best performance is obtained with a hybrid execution in which the discrete wavelet transform is executed on a GPU and Tier-1 on a multi-core. Using an Intel i7 multi-core in combination with a modest NVIDIA Quadro K620 GPU yields speedups greater than 10 compared with the original sequential code. The performance bottlenecks that arise on GPUs when parallelizing algorithms that are coarse-grained by nature are discussed and also the optimizations that are possible. A performance analysis reveals the inefficiencies and explains the deviations from the GPU peak performance.
Cornelis, JG, Lemeire, J, Bruylants, T & Schelkens, P 2017, 'Heterogeneous acceleration of volumetric JPEG 2000 using OpenCL', International Journal of High Performance Computing Applications, vol. 31, no. 3, pp. 229-245. https://doi.org/10.1177/1094342016646438
Cornelis, J. G., Lemeire, J., Bruylants, T., & Schelkens, P. (2017). Heterogeneous acceleration of volumetric JPEG 2000 using OpenCL. International Journal of High Performance Computing Applications, 31(3), 229-245. https://doi.org/10.1177/1094342016646438
@article{66f7e98efb4a414cb8d65c0cf5c45dcb,
title = "Heterogeneous acceleration of volumetric JPEG 2000 using OpenCL",
abstract = "This paper discusses an OpenCL version of a volumetric JPEG 2000 codec that runs on GPUs, multi-core processors or a combination of both. Since the performance critical part consists of a fine-grained (discrete wavelet transform) and coarse-grained algorithm (Tier-1), the best performance is obtained with a hybrid execution in which the discrete wavelet transform is executed on a GPU and Tier-1 on a multi-core. Using an Intel i7 multi-core in combination with a modest NVIDIA Quadro K620 GPU yields speedups greater than 10 compared with the original sequential code. The performance bottlenecks that arise on GPUs when parallelizing algorithms that are coarse-grained by nature are discussed and also the optimizations that are possible. A performance analysis reveals the inefficiencies and explains the deviations from the GPU peak performance. ",
keywords = "GPU, Hybrid, multi-core, OpenCL, volumetric JPEG 2000",
author = "Cornelis, {Jan G.} and Jan Lemeire and Tim Bruylants and Peter Schelkens",
year = "2017",
month = may,
day = "1",
doi = "10.1177/1094342016646438",
language = "English",
volume = "31",
pages = "229--245",
journal = "International Journal of High Performance Computing Applications",
issn = "1094-3420",
publisher = "SAGE Publications Ltd",
number = "3",
}