“Signal Processing in the AI era” was the tagline of this year’s IEEE International Conference on Acoustics, Speech and Signal Processing, taking place in Rhodes, Greece.
In this context, Brent de Weerdt, Xiangyu Yang, Boris Joukovsky, Alex Stergiou and Nikos Deligiannis presented ETRO’s research during poster sessions and oral presentations, with novel ways to process and understand graph, video, and audio data. Nikos Deligiannis chaired a session on Graph Deep Learning, attended the IEEE T-IP Editorial Board Meeting, and had the opportunity to meet with collaborators from the VUB-Duke-Ugent-UCL joint lab.
Featured articles:

Growth of personnel: 5 staff members: Jacques, Jean (assistants) Ingrid Sansens and André Pletinckx (technicians), as secretary Gilberte Lievens and Oscar Steenhaut (HoD).
On June 13th 2024 at 16:00, Remco Royen will defend their PhD entitled “ADDRESSING LABELLING, COMPLEXITY, LATENCY, AND SCALABILITY IN DEEP LEARNING-BASED PROCESSING OF POINT CLOUDS”.
Everybody is invited to attend the presentation in room I.0.01, or digitally via this link.
In recent years, deep learning has gained widespread use, demonstrating its significance across various domains. Its ability to automatically learn intricate patterns from vast datasets has resulted in a transformative impact, driving advancements in technology, and reshaping the landscape of artificial intelligence applications. The ongoing development of increasingly sophisticated neural network architectures continues to push the boundaries of what is achievable across diverse sectors.
As a result, deep learning has become ubiquitous. However, certain limitations hinder its broad applicability. This thesis delves into four crucial challenges associated with deep learning-based point cloud processing: (i) the precise labeling of extensive datasets, (ii) the model complexity requirements, (iii) the latency introduced during inference, and (iv) the concept of scalability. The initial challenge stems from the necessity for extensive datasets with highly accurate annotations. Particularly in the 3D domain, obtaining such high-quality annotations proves challenging and, consequently, expensive. The second challenged arises from the development of more intricate and memory-intensive, facilitated by advancements in high-power-consuming graphics cards. While these methods achieve higher performance levels, they impose constraints on deployment, particularly for embedded devices. Furthermore, the escalating complexity of these networks is accompanied by an increased inference time, impeding real-time applications. Lastly, deep learning-based solutions lack the concept of scalability which have proven vital in traditional methods.
In this thesis, we tackle these challenges and propose diverse solutions within the deep learning paradigm. The thesis commences with the introduction of a rapid 3D LiDAR simulator, designed to mitigate the labeling problem by learning from perfectly annotated synthetic data. We demonstrate its applications in 3D denoising and semantic segmentation. A second contribution can be found within the domain of point cloud instance segmentation. Through the joint learning of prototypes and coefficients, we present an efficient and rapid method that demands relatively low GPU memory. To further improve our method, we introduce an enhanced block merging algorithm. As a third main contribution, we achieve deep learning-based quality scalability by learning embedded latent representations, demonstrating compelling results in applications such as image reconstruction, point cloud compression, and image semantic hashing. The final contribution introduces resolution-scalable 3D semantic segmentation of point clouds. When applied to resolutionscalable 3D sensors, it enables joint point cloud acquisition and processing.
Our proposed methods consistently outperform established benchmarks across diverse datasets, as demonstrated through comprehensive experimentation. The research findings have been disseminated in various reputable journals and conferences, and have led to a patent submission, highlighting their impact in both academic and industrial contexts.
On July 1st 2024 at 16:00, Panagiotis Gonidakis will defend their PhD entitled “DATA- AND LABEL-EFFICIENT DEEP LEARNING FOR MEDICAL IMAGE ANALYSIS APPLICATION TO LUNG NODULE DETECTION ON THORACIC CT”.
Everybody is invited to attend the presentation in room D.0.03, or digitally via this link.
Convolutional neural networks (CNNs) have been widely used to detect and classify various objects and structures in computer vision and medical imaging. Access to large sets of annotated data is commonly a prerequisite for achieving good performance. In medical imaging, acquiring adequate amounts of labelled data can often be time consuming and costly. Therefore, reducing the need for data and in particular associated annotations, is of high importance for medical imaging applications. In this work we investigated whether we can lower the need of annotated data for a supervised learning classification problem.
We chose to tackle the problem of lung nodule detection in thoracic computed tomography (CT) imaging, as this widely investigated application allowed us to benefit from publicly available data and benchmark our methods. We designed a 3D CNN architecture to perform patch-wise classification of candidate nodules for false positive reduction. Its training, testing and fine-tuning procedure is optimized, we evaluated its performance, and we compared it with other state-of-the-art approaches in the field.
Next, we explored how data augmentation can contribute towards more accurate and less data-demanding models. We investigated the relative benefit of increasing the amount of original data, with respect to computationally augmenting the amount of training samples. Our result indicated that in general, better performance is achieved when increasing the amount of unique data samples, or augmenting the data more extensively, as expected. Surprisingly however, we observed that after reaching a certain amount of training samples, data augmentation led to significantly better performance compared to adding unique samples. Amongst investigated augmentation methods, rotations were found to be most beneficial for improving model performance.
Following, we investigated the benefit of combining deep learning with handcrafted features. We explored three fusion strategies with increasing complexity and assessed their performance for varying amounts of training data. Our findings indicated that combining handcrafted features with a 3D CNN approach significantly improved lung nodule detection performance in comparison to an independently trained CNN model, regardless of the fusion strategy. Comparatively larger increases in performance were obtained when less training data was available. The fusion strategy in which features are combined with a CNN using a single end-to-end training scheme performed best overall, allowing to reduce training data by 33% to 43%, while maintaining performance. Among the investigated handcrafted features, those that describe the relative position of the candidate with respect to the lung wall and mediastinum, were found to be of most benefit.
Finally, we considered the case in which abundant data is available, but annotations are scarce, and investigated several methods to improve label-efficiency and their combined effect. We proposed a framework that utilizes both annotated and unannotated data, can be pretrained via self-supervision, and allows to combine handcrafted features with learned representations. Interestingly, the improvements in performance derived from the proposed learning schemes were found to accumulate, leading to increased label-efficiency when these strategies are combined. We observed a potential to decrease the amount of annotated data up to 68% when compared to traditional supervised training, while maintaining performance.
Our findings indicate that the investigated methods allow considerable reduction of data and/or annotations while maintaining model performance for lung nodule detection from CT imaging. Future work should investigate whether these results generalize to other domains, such that more applications that face challenges due to a shortage of annotated data may benefit from the potential of deep learning.
On October 9th 2024 at 16:30, Esther Rodrigo Bonet will defend their PhD entitled “EXPLAINABLE AND PHYSICS-GUIDED GRAPH DEEP LEARNING FOR AIR POLLUTION MODELLING”.
Everybody is invited to attend the presentation in room I.0.02.
Air pollution has become a worldwide concern due to its negative impact on the population’s health and well-being. To mitigate its effects, it is essential to monitor pollutant concentrations across regions and time accurately. Traditional solutions rely on physics-driven approaches, leveraging particle motion equations to predict pollutants’ shifts in time. Despite being reliable and easy to interpret, they are computationally expensive and require background domain knowledge. Alternatively, recent works have shown that data-driven approaches, especially deep learning models, significantly reduce the computational expense and provide accurate predictions; yet, at the cost of massive data and storage requirements and lower interpretability.
This PhD research develops innovative air pollution monitoring solutions focusing on high accuracy, manageable complexity, and high interpretability. To this end, the research proposes various graph-based deep learning solutions focusing on two key aspects, namely, physics-guided deep learning and explainability.
First, as there exist correlations among the data points in smart city data, we propose exploiting them using graph-based deep learning techniques. Specifically, we leverage generative models that have proven efficient in data generation tasks, namely, variational graph autoencoders. The proposed models employ graph convolutional operations and data fusion techniques to leverage the graph structure and the multi-modality of the data at hand. Additionally, we design physics-guided deep-learning models that follow well-studied physical equations. By updating the graph convolution operator of graph convolutional networks to leverage the physics convection-diffusion equation, we can physically guide the learning curve of our network.
The second key point relates to explainability. Specifically, we design novel explainability techniques for interpretable graph deep modeling. We explore existing explainability algorithms, including Lasso and a layer-wise relevance propagation approach, and go beyond them to our graph-based architectures, designing efficient and specifically tailored explanation tools. Our explanation techniques are able to provide insights and visualizations based on various input data sources.
Overall, the research has produced state-of-the-art models that combine the best of both (physics-guided) graph-deep-learning-based and explainable approaches for inferring, predicting, and explaining air pollution. The developed techniques can also be applied to various applications in modeling graphs on the Internet such as in recommender systems’ applications.
It is possible to perform the preparatory program in parallel with the master program and does not add study duration to the 2-year master program if you choose to do so.
A full immersive experience of Augmented Reality for neurosurgical planning and real-time intervention, demonstrated by Taylor on the FARI immersive CAVE during the Agoria HealthTech roundtable event June 17th , 2024.

