Today, most reduction algorithms are optimized for balanced workloads; they assume all processes will start the reduction at about the same time. However, in practice this is not always the case and significant load imbalances may occur and affect the performance of said algorithms. In this paper we investigate the impact of such imbalances on the most commonly employed reduction algorithms and propose a new algorithm specifically adapted to the presented context. Firstly, we analyze the optimistic case where we have a priori knowledge of all imbalances and propose a near-optimal solution. In the general case, where we do not have any foreknowledge of the imbalances, we propose a dynamically rebalanced tree reduction algorithm. We show experimentally that this algorithm performs better than the default OpenMPI and MVAPICH2 implementations.
Marendic, P, Lemeire, J, Haber, T, Vucinic, D & Schelkens, P 2012, An Investigation into the Performance of Reduction Algorithms under Load Imbalance. in Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing. Springer, pp. 439-450, the 18th international conference on Parallel Processing, Rhodes Island, Greece, 27/08/12. <http://www.springerlink.com/content/80866255u35n2177/>
Marendic, P., Lemeire, J., Haber, T., Vucinic, D., & Schelkens, P. (2012). An Investigation into the Performance of Reduction Algorithms under Load Imbalance. In Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing (pp. 439-450). Springer. http://www.springerlink.com/content/80866255u35n2177/
@inproceedings{2f68d471ee6c483eaad0574d29f6497e,
title = "An Investigation into the Performance of Reduction Algorithms under Load Imbalance",
abstract = "Today, most reduction algorithms are optimized for balanced workloads; they assume all processes will start the reduction at about the same time. However, in practice this is not always the case and significant load imbalances may occur and affect the performance of said algorithms. In this paper we investigate the impact of such imbalances on the most commonly employed reduction algorithms and propose a new algorithm specifically adapted to the presented context. Firstly, we analyze the optimistic case where we have a priori knowledge of all imbalances and propose a near-optimal solution. In the general case, where we do not have any foreknowledge of the imbalances, we propose a dynamically rebalanced tree reduction algorithm. We show experimentally that this algorithm performs better than the default OpenMPI and MVAPICH2 implementations.",
keywords = "MPI, imbalance, collective, reduction, process skew, benchmarking",
author = "Petar Marendic and Jan Lemeire and Tom Haber and Dean Vucinic and Peter Schelkens",
year = "2012",
language = "English",
isbn = "978-3-642-32819-0",
pages = "439--450",
booktitle = "Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing",
publisher = "Springer",
note = "the 18th international conference on Parallel Processing, Euro-Par'12 ; Conference date: 27-08-2012 Through 31-08-2012",
}