Generalizing over temporal variations is a prerequisite for effective action recognition in videos. Despite significant advances in deep neural networks, it remains a challenge to focus on short-term discriminative motions in relation to the overall performance of an action. We address this challenge by allowing some flexibility in discovering relevant spatio-temporal features. We introduce Squeeze and Recursion Temporal Gates (SRTG), an approach that favors inputs with similar activations with potential temporal variations. We implement this idea with a novel CNN block that uses an LSTM to encapsulate feature dynamics, in conjunction with a temporal gate that is responsible for evaluating the consistency of the discovered dynamics and the modeled features. We show consistent improvement when using SRTG blocks, with only a minimal increase in the number of GFLOPs. On Kinetics-700, we perform on par with current state-of-the-art models, and outperform these on HACS, Moments in Time, UCF-101 and HMDB-51.1
Stergiou, A 2020, 'Learn to cycle: Time-consistent feature discovery for action recognition', Pattern Recognition Letters, vol. 141, pp. 1-7. https://doi.org/10.1016/j.patrec.2020.11.012
Stergiou, A. (2020). Learn to cycle: Time-consistent feature discovery for action recognition. Pattern Recognition Letters, 141, 1-7. https://doi.org/10.1016/j.patrec.2020.11.012
@article{0e1968b6ef1c4ed1bef8260aedda1003,
title = "Learn to cycle: Time-consistent feature discovery for action recognition",
abstract = "Generalizing over temporal variations is a prerequisite for effective action recognition in videos. Despite significant advances in deep neural networks, it remains a challenge to focus on short-term discriminative motions in relation to the overall performance of an action. We address this challenge by allowing some flexibility in discovering relevant spatio-temporal features. We introduce Squeeze and Recursion Temporal Gates (SRTG), an approach that favors inputs with similar activations with potential temporal variations. We implement this idea with a novel CNN block that uses an LSTM to encapsulate feature dynamics, in conjunction with a temporal gate that is responsible for evaluating the consistency of the discovered dynamics and the modeled features. We show consistent improvement when using SRTG blocks, with only a minimal increase in the number of GFLOPs. On Kinetics-700, we perform on par with current state-of-the-art models, and outperform these on HACS, Moments in Time, UCF-101 and HMDB-51.1",
author = "Alexandros Stergiou",
note = "Funding Information: This publication is supported by the Netherlands Organization for Scientific Research (NWO) with a TOP-C2 grant for “Automatic recognition of bodily interactions” (ARBITER). Funding Information: This publication is supported by the Netherlands Organization for Scientific Research (NWO) with a TOP-C2 grant for ?Automatic recognition of bodily interactions? (ARBITER). Publisher Copyright: {\textcopyright} 2020 Copyright: Copyright 2020 Elsevier B.V., All rights reserved.",
year = "2020",
month = nov,
doi = "10.1016/j.patrec.2020.11.012",
language = "English",
volume = "141",
pages = "1--7",
journal = "Pattern Recognition Letters",
issn = "0167-8655",
publisher = "Elsevier",
}