Pooling layers are essential building blocks of convolutional neural networks (CNNs), to reduce computational overhead and increase the receptive fields of proceeding convolutional operations. Their goal is to produce downsampled volumes that closely resemble the input volume while, ideally, also being computationally and memory efficient. Meeting both these requirements remains a challenge. To this end, we propose an adaptive and exponentially weighted pooling method: adaPool. Our method learns a regional-specific fusion of two sets of pooling kernels that are based on the exponent of the Dice-S{\o}rensen coefficient and the exponential maximum, respectively. AdaPool improves the preservation of detail on a range of tasks including image and video classification and object detection. A key property of adaPool is its bidirectional nature. In contrast to common pooling methods, the learned weights can also be used to upsample activation maps. We term this method adaUnPool. We evaluate adaUnPool on image and video super-resolution and frame interpolation. For benchmarking, we introduce Inter4K, a novel high-quality, high frame-rate video dataset. Our experiments demonstrate that adaPool systematically achieves better results across tasks and backbones, while introducing a minor additional computational and memory overhead.
Stergiou, A & Poppe, R 2023, 'AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling', IEEE Transactions on Image Processing, vol. 32, pp. 251-266. https://doi.org/10.1109/TIP.2022.3227503
Stergiou, A., & Poppe, R. (2023). AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling. IEEE Transactions on Image Processing, 32, 251-266. https://doi.org/10.1109/TIP.2022.3227503
@article{70223389c6d843bc9bbdc18b6f6e6645,
title = "AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling",
abstract = "Pooling layers are essential building blocks of convolutional neural networks (CNNs), to reduce computational overhead and increase the receptive fields of proceeding convolutional operations. Their goal is to produce downsampled volumes that closely resemble the input volume while, ideally, also being computationally and memory efficient. Meeting both these requirements remains a challenge. To this end, we propose an adaptive and exponentially weighted pooling method: adaPool. Our method learns a regional-specific fusion of two sets of pooling kernels that are based on the exponent of the Dice-S{\o}rensen coefficient and the exponential maximum, respectively. AdaPool improves the preservation of detail on a range of tasks including image and video classification and object detection. A key property of adaPool is its bidirectional nature. In contrast to common pooling methods, the learned weights can also be used to upsample activation maps. We term this method adaUnPool. We evaluate adaUnPool on image and video super-resolution and frame interpolation. For benchmarking, we introduce Inter4K, a novel high-quality, high frame-rate video dataset. Our experiments demonstrate that adaPool systematically achieves better results across tasks and backbones, while introducing a minor additional computational and memory overhead.",
keywords = "Computer architecture, downsampling, Interpolation, Kernel, pooling, Superresolution, Task analysis, upsampling, Visualization, Weight measurement",
author = "Alexandros Stergiou and Ronald Poppe",
note = "Publisher Copyright: {\textcopyright} 1992-2012 IEEE. Copyright: Copyright 2023 Elsevier B.V., All rights reserved.",
year = "2023",
doi = "10.1109/TIP.2022.3227503",
language = "English",
volume = "32",
pages = "251--266",
journal = "IEEE Transactions on Image Processing",
issn = "1057-7149",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
}