Publication Details
, Ali Gorji, André Bourdoux, Sofie Pollin, Hichem Sahli

IEEE Access

Contribution To Journal


In this paper, we propose a Multi-View Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM) network which fuses multiple ‘‘views’’ of the time-range-Doppler radar data-cube for human activity recognition. It adopts the structure of convolutional neural networks to extract optimal frame based features from the time-range, time-Doppler and range-Doppler projections of the radar datacube. The CNN models are trained using an unsupervised Convolutional Auto-Encoder (CAE) topology. Afterwards, the pre-trained parameters of the encoder are fine-tuned to extract intermediate frame based representations, which are subsequently aggregated via LSTM networks for sequence classification. The temporal correlation among the views is explicitly learned by sharing the LSTM network weights across different views. Moreover, we propose range and Doppler energy dispersion and temporal difference based features as an input to the CNN-LSTM models. Furthermore, we investigate the use of target tracking features as an auxiliary side information. The proposed model is trained on datasets collected in both cluttered and uncluttered environments. For validation, an independent test dataset, with unseen participants, in a cluttered environment was collected. Fusion with auxiliary features improves the generalization by 5%, yielding an overall Macro F1-score of 74.7%.

DOI scopus