“Signal Processing in the AI era” was the tagline of this year’s IEEE International Conference on Acoustics, Speech and Signal Processing, taking place in Rhodes, Greece.
In this context, Brent de Weerdt, Xiangyu Yang, Boris Joukovsky, Alex Stergiou and Nikos Deligiannis presented ETRO’s research during poster sessions and oral presentations, with novel ways to process and understand graph, video, and audio data. Nikos Deligiannis chaired a session on Graph Deep Learning, attended the IEEE T-IP Editorial Board Meeting, and had the opportunity to meet with collaborators from the VUB-Duke-Ugent-UCL joint lab.
Featured articles:
On May 25 2023 at 16.00, Redona Brahimetaj will defend her PhD entitled “Classification of breast cancer – in vitro microcalcification analysis in 3D micro-CT images”.
Everybody is invited to attend the presentation in room D2.01.
Breast cancer is the most commonly diagnosed cancer in women worldwide. On a mammography, it can manifest as breast masses and/or subtle architectural distortions and/or microcalcifications (MCs). Among all the manifestations observed clinically, MCs are usually considered a robust marker of an early breast cancer. Detecting, interpreting and discriminating the MCs found in benign/malignant lesions, represents a challenge for clinicians considering the limitations that the current standard screening modalities used have nowadays (i.e.: lack of contrast, 2D/3D low resolution, superposition of tissue). An improved diagnostic accuracy of breast MCs is of special importance for two main reasons: (a) to assess the likelihood of malignancy at the very initial phase of the disease; (b) to avoid unnecessary invasive interventions.
Despite the promising results that Computer Aided Detection and Diagnosis (CAD) systems have achieved over the past years, the existing systems provide diagnosis based on MCs clusters visualized in 2D or low-resolution 3D. In contrast to the majority of the studies performed, in this PhD thesis we aim to provide breast cancer diagnosis based solely on individual MCs properties which are visualized in 3D and at high resolution. In vivo high-resolution breast imaging is currently not possible yet, so the images used are obtained by scanning breast biopsies with a micro-CT scanner.
Several contributions were achieved in this thesis. As a first contribution we evaluated the feasibility of developing a machine learning CAD system able to diagnose breast cancer based only on handcrafted features of individual MCs. As a second contribution, we explored for the first time the impact of image resolution (8ÎĽm, 16ÎĽm, 32ÎĽm, 64ÎĽm) when diagnosing individual breast MCs. As a third contribution, we participated in a new data collection procedure and performed sensitivity analysis where we explored the effect of different segmentation thresholds in providing individual MCs diagnosis. As a fourth contribution, we evaluated the performance of a deep learning framework to provide benign/malignant diagnosis based on automatically learned features from high resolution 3D MCs images.
Although our research is currently not directly applicable in vivo (3D high resolution in vivo breast imaging is still not possible), we demonstrated its potential to be used in further research/clinical scenarios as soon as further improvements in the current breast screening modalities will occur. At the moment, the results achieved can potentially be used in intra operative imaging to reduce the waiting time between tissue extraction and anatomopathological results. As a long term goal, with our study we aim to avoid unnecessary biopsies and considerably reduce costs for the healthcare system.
On November 7th 2024 at 16:00, Boris Joukovsky will defend their PhD entitled “ SIGNAL PROCESSING MEETS DEEP LEARNING: INTERPRETABLE AND EXPLAINABLE NEURAL NETWORKS FOR VIDEO ANALYSIS, SEQUENCE MODELING AND COMPRESSION”.
Everybody is invited to attend the presentation in room I.0.01 or online via this link.
There is growing use of deep learning for solving signal processing tasks, and deep neural networks (DNNs) often outperform traditional methods little domain knowledge needed. However, DNNs behave as black boxes, making it difficult to understand their decisions. The empirical approaches to design DNNs often lack theoretical guarantees and create high computational requirements, which poses risks for applications requiring trustworthy artificial intelligence (AI). This thesis addresses these issues, focusing on video processing and sequential problems across three domains: (1) efficient, model-based DNN designs, (2) generalization analysis and information-theory-driven learning, and (3) post-hoc explainability.
The first contributions consist of new deep learning models for successive frame reconstruction, foreground-background separation, and moving object detection in video. These models are based on the deep unfolding method, a hybrid approach that combines deep learning with optimization techniques, leveraging low-complexity prior knowledge of the data. The resulting networks require fewer parameters than standard DNNs. They outperform DNNs of comparable size, large semantic-based convolutional networks, as well the underlying non-learned optimization methods.
The second area focuses on the theoretical generalization of deep unfolding models. The generalization error of reweighted-RNN (the model that performs video reconstruction) is characterized using Rademacher complexity analysis. This is a first-of-its-kind result that bridges machine learning theory with deep unfolding RNNs.
Another contribution in this area aims to learn optimally compressed, quality-scalable representations of distributed signals: a scheme traditionally known as Wyner-Ziv coding (WZC). The proposed method shows that deep models can retrieve layered binning solutions akin to optimal WZC, which is promising to learn constructive coding schemes for distributed applications.
The third area introduces InteractionLIME, an algorithm to explain how deep models learn multi-view or multi-modal representations. It is the first model-agnostic explanation method design to identify the important feature pairs across inputs that affect the prediction. Experimental results demonstrate its effectiveness on contrastive vision and language models.
In conclusion, this thesis addresses important challenges in making deep learning models more interpretable, efficient, and theoretically grounded, particularly for video processing and sequential data, thereby contributing to the development of more trustworthy AI systems.
Some ETRO staff went to the recent gala ball organised by the engineering students association (PK). We also know how to partyyyy!
On Monday September 19, Prof. Nikos Deligiannis, Prof. Bruno Da Silva and Prof. Bart Jansen gave a warm welcome to the new generation of MACS students. We expect 50+ new students in the first master year.
On November 14 2022 at 17.00, SĂ©golène Rogge will defend her PhD entitled “’Depth estimation in multiview light field camera system”.
Everybody is invited to attend the presentation in room D.2.01.
In this research, we went through the main stages to render a 3D scene in 6 Degrees of Freedom: data acquisition, point cloud or depth map estimation, surface reconstruction and view synthesis rendering.
We generated datasets, some of them computer-generated using Blender, and some of them real scenery for which we built two acquisition rigs. On the first one we could put any type of camera – RGB or Time-of-Flight – which could then be moved in space in X, Y or Z direction thus sampling the scene anywhere withing a cube meter; the second one, more rigid, holding an array of 3-by-3 Light Field cameras. The acquired data was used to devise some depth estimation algorithms based on multi stereo matching algorithms improved with deep learning techniques. We also worked on the triangulation of a fast laser dot moving through the scene. We proved that the resulting depth map from one light field camera can be improved by using neighboring cameras while increasing the parallax. As each of the generated depth maps can be reprojected in space to form a point cloud, we implemented an improved a registration algorithm to put multiple point clouds together, enforcing the 3D structure of the scene. Finally, we rendered scenes on two different devices: a head mounted display and an holographic display. To render a point cloud in real time within an oculus, we stored it using an efficient data structure and optimized the rendering time using sub samples of the point cloud selected using level-of-detail based on the distance between the user and the various parts of the scene, and frustum culling. The visual comfort was brought to the user by the means of splatting techniques to simulate the surfaces of the scene. To render a scene on an holographic screen, we first captured it from different view points with light field cameras, then estimated the depth to be able to synthesize views in between, and use them as input to display on the screen.
Our work achieved state-of-the-art results, in particular in depth estimation for light field images and point clouds registration, and was published in various journals and conferences. It also led to multiple contributions in the Moving Picture Experts Group.