“Signal Processing in the AI era” was the tagline of this year’s IEEE International Conference on Acoustics, Speech and Signal Processing, taking place in Rhodes, Greece.
In this context, Brent de Weerdt, Xiangyu Yang, Boris Joukovsky, Alex Stergiou and Nikos Deligiannis presented ETRO’s research during poster sessions and oral presentations, with novel ways to process and understand graph, video, and audio data. Nikos Deligiannis chaired a session on Graph Deep Learning, attended the IEEE T-IP Editorial Board Meeting, and had the opportunity to meet with collaborators from the VUB-Duke-Ugent-UCL joint lab.
Featured articles:

Five young academics have been chosen to take on administrative tasks for a year in addition to their academic work to support the rector and vice-rectors. From ETRO, civil engineer Jeroen Van Schependom will take care of the vice-rectorate Research Policy.
These five academic staff members will have the opportunity, with the current management team, to develop their leadership potential and inspire the rectoral policy team. They will devote one day a week within their current tenure to this new role. Each will work closely with the rector or a vice-rector in a specific policy area to gain a tangible view of what leadership and policymaking means in practice.
“By giving young academics the opportunity to hone their policy competencies and weigh in on VUB policy, the university aims to increase its policy capability. The voice and views of our younger colleagues are absolutely essential. After all, they are also the leaders of the future,” says rector Caroline Pauwels.
An ETRO team participated in the 2nd COV19D Competition of the AIMIA Workshop at #ECCV2022 (https://mlearn.lincoln.ac.uk/eccv-2022-ai-mia/). This 2nd COV19D Competition included two Challenges: i) COVID19 Detection and ii) COVID19 Severity Detection. Our team with Abel DĂaz, Tanmoy Mukherjee, MatĂas Bossa, Nikos Deligiannis, Hichem Sahli, and the IT support of Luc van Kempen submitted a solution that beat the Competition’s baseline on both challenges!
The figure illustrates the used method.

On July 2 2021 at 16.30 Abel Diaz Berenguer will defend his PhD entitled “Learning to predict human behavior in crowded scenes”.
Automatically understanding human behavior is one of the most fundamental research topic towards socially aware vision-based autonomous systems. There is an increasing interest in incorporating the social signal perspective into the learning systems pipeline. This dissertation focuses on developing and incorporating computational mechanisms of Computer Vision and Machine Learning to analyze and predict human behavior in crowded scenes automatically. Our research specifically addresses public safety assisted by autonomous video surveillance systems aiming to decrease the human labor dedicated to video monitoring.
Our research efforts concentrate on the information processing pipeline for learning systems that cope with human trajectory prediction and human behavior analysis in crowded scenes. We contribute to human trajectory prediction in crowded scenes with
(i) a novel latent variable model aware of the human-human and human-contextual interactions to predict plausible trajectories. Furthermore,
(ii) a novel latent location-velocity recurrent model that predicts future variable and feasible trajectories. Towards human anomalous behavior detection, we adopt two unsupervised approaches based on the scene dominant behavior and trajectories underlying properties to address trajectory-based anomaly detection. Besides, we contribute with
(iii) a supervised approach capable of attaining discriminative sequence-based feature representations to recognize whether video sequences depict violent human behavior.
Extensive experiments on publicly available datasets, demonstrate the effectiveness of our proposals.
Several new buildings on Campus Oefenplein become operational.
On July 1st 2024 at 16:00, Panagiotis Gonidakis will defend their PhD entitled “DATA- AND LABEL-EFFICIENT DEEP LEARNING FOR MEDICAL IMAGE ANALYSIS APPLICATION TO LUNG NODULE DETECTION ON THORACIC CT”.
Everybody is invited to attend the presentation in room D.0.03, or digitally via this link.
Convolutional neural networks (CNNs) have been widely used to detect and classify various objects and structures in computer vision and medical imaging. Access to large sets of annotated data is commonly a prerequisite for achieving good performance. In medical imaging, acquiring adequate amounts of labelled data can often be time consuming and costly. Therefore, reducing the need for data and in particular associated annotations, is of high importance for medical imaging applications. In this work we investigated whether we can lower the need of annotated data for a supervised learning classification problem.
We chose to tackle the problem of lung nodule detection in thoracic computed tomography (CT) imaging, as this widely investigated application allowed us to benefit from publicly available data and benchmark our methods. We designed a 3D CNN architecture to perform patch-wise classification of candidate nodules for false positive reduction. Its training, testing and fine-tuning procedure is optimized, we evaluated its performance, and we compared it with other state-of-the-art approaches in the field.
Next, we explored how data augmentation can contribute towards more accurate and less data-demanding models. We investigated the relative benefit of increasing the amount of original data, with respect to computationally augmenting the amount of training samples. Our result indicated that in general, better performance is achieved when increasing the amount of unique data samples, or augmenting the data more extensively, as expected. Surprisingly however, we observed that after reaching a certain amount of training samples, data augmentation led to significantly better performance compared to adding unique samples. Amongst investigated augmentation methods, rotations were found to be most beneficial for improving model performance.
Following, we investigated the benefit of combining deep learning with handcrafted features. We explored three fusion strategies with increasing complexity and assessed their performance for varying amounts of training data. Our findings indicated that combining handcrafted features with a 3D CNN approach significantly improved lung nodule detection performance in comparison to an independently trained CNN model, regardless of the fusion strategy. Comparatively larger increases in performance were obtained when less training data was available. The fusion strategy in which features are combined with a CNN using a single end-to-end training scheme performed best overall, allowing to reduce training data by 33% to 43%, while maintaining performance. Among the investigated handcrafted features, those that describe the relative position of the candidate with respect to the lung wall and mediastinum, were found to be of most benefit.
Finally, we considered the case in which abundant data is available, but annotations are scarce, and investigated several methods to improve label-efficiency and their combined effect. We proposed a framework that utilizes both annotated and unannotated data, can be pretrained via self-supervision, and allows to combine handcrafted features with learned representations. Interestingly, the improvements in performance derived from the proposed learning schemes were found to accumulate, leading to increased label-efficiency when these strategies are combined. We observed a potential to decrease the amount of annotated data up to 68% when compared to traditional supervised training, while maintaining performance.
Our findings indicate that the investigated methods allow considerable reduction of data and/or annotations while maintaining model performance for lung nodule detection from CT imaging. Future work should investigate whether these results generalize to other domains, such that more applications that face challenges due to a shortage of annotated data may benefit from the potential of deep learning.
On June 13th 2024 at 16:00, Remco Royen will defend their PhD entitled “ADDRESSING LABELLING, COMPLEXITY, LATENCY, AND SCALABILITY IN DEEP LEARNING-BASED PROCESSING OF POINT CLOUDS”.
Everybody is invited to attend the presentation in room I.0.01, or digitally via this link.
In recent years, deep learning has gained widespread use, demonstrating its significance across various domains. Its ability to automatically learn intricate patterns from vast datasets has resulted in a transformative impact, driving advancements in technology, and reshaping the landscape of artificial intelligence applications. The ongoing development of increasingly sophisticated neural network architectures continues to push the boundaries of what is achievable across diverse sectors.
As a result, deep learning has become ubiquitous. However, certain limitations hinder its broad applicability. This thesis delves into four crucial challenges associated with deep learning-based point cloud processing: (i) the precise labeling of extensive datasets, (ii) the model complexity requirements, (iii) the latency introduced during inference, and (iv) the concept of scalability. The initial challenge stems from the necessity for extensive datasets with highly accurate annotations. Particularly in the 3D domain, obtaining such high-quality annotations proves challenging and, consequently, expensive. The second challenged arises from the development of more intricate and memory-intensive, facilitated by advancements in high-power-consuming graphics cards. While these methods achieve higher performance levels, they impose constraints on deployment, particularly for embedded devices. Furthermore, the escalating complexity of these networks is accompanied by an increased inference time, impeding real-time applications. Lastly, deep learning-based solutions lack the concept of scalability which have proven vital in traditional methods.
In this thesis, we tackle these challenges and propose diverse solutions within the deep learning paradigm. The thesis commences with the introduction of a rapid 3D LiDAR simulator, designed to mitigate the labeling problem by learning from perfectly annotated synthetic data. We demonstrate its applications in 3D denoising and semantic segmentation. A second contribution can be found within the domain of point cloud instance segmentation. Through the joint learning of prototypes and coefficients, we present an efficient and rapid method that demands relatively low GPU memory. To further improve our method, we introduce an enhanced block merging algorithm. As a third main contribution, we achieve deep learning-based quality scalability by learning embedded latent representations, demonstrating compelling results in applications such as image reconstruction, point cloud compression, and image semantic hashing. The final contribution introduces resolution-scalable 3D semantic segmentation of point clouds. When applied to resolutionscalable 3D sensors, it enables joint point cloud acquisition and processing.
Our proposed methods consistently outperform established benchmarks across diverse datasets, as demonstrated through comprehensive experimentation. The research findings have been disseminated in various reputable journals and conferences, and have led to a patent submission, highlighting their impact in both academic and industrial contexts.