ETROVUB

Annotation-efficient Medical Image Segmentation using Promptable Foundation Models ■

Subject ■

Medical image segmentation is an essential step in many biomedical applications, including
organ delineation, lesion quantification, treatment planning and disease monitoring.
Traditionally, segmentation requires either manual annotation by experts or fully
supervised deep learning models trained on carefully annotated datasets. Manual
annotation is time-consuming and expensive, while supervised models often require large
amounts of task-specific training data.
Recently, promptable foundation models for image segmentation, such as the Segment
Anything Model, have shown that interactive segmentation can be performed using simple
prompts such as points, bounding boxes or masks. Medical variants of these models have
also been proposed, aiming to adapt promptable segmentation to biomedical images such
as CT, MRI, ultrasound or microscopy. More recently, VoxTell has extended this paradigm
toward free-text promptable 3D medical image segmentation, directly mapping naturallanguage
descriptions to volumetric masks across modalities such as CT, PET and MRI.
Despite their promise, it remains unclear how well promptable segmentation models
generalize to medical image settings, where image characteristics differ significantly from
natural images and where prompt types now range from spatial cues to natural-language
descriptions. Medical objects can have weak boundaries, low contrast, large anatomical
variation, and inherently three-dimensional structure, making segmentation challenging
even with interactive guidance. Therefore, systematic evaluation is needed to determine
when these models are useful, how much annotation effort they can save, and how they
compare to traditional medical segmentation approaches.

Kind of work ■

The objective of this thesis is to investigate the performance and annotation efficiency of
promptable foundation models for medical image segmentation. The project will comprise
three main steps. Firstly, a literature study will be performed on medical image
segmentation, foundation models and promptable segmentation. Secondly, pretrained
promptable segmentation models will be evaluated on one or more public medical imaging
datasets. Thirdly, their performance will be compared with classical supervised
segmentation baselines and analysed in terms of accuracy, robustness and required user
interaction.

Framework of the Thesis ■

The developments will be performed in Python, using open-source image processing and
deep learning frameworks.
The project will involve:
• Literature study on medical image segmentation and medical foundation models.
• Selection of one or more public segmentation datasets.
• Preprocessing of CT, MRI or other relevant medical imaging data.
• Implementation of an evaluation pipeline for promptable segmentation models.
• Evaluation of different prompting strategies, such as point prompts, bounding-box
prompts and automatically generated prompts.
• Comparison of general foundation models with medical-adapted models, where
available.
• Implementation or reuse of a supervised segmentation baseline, for example nnUNet
or a MONAI-based U-Net.
• Quantitative evaluation using metrics such as Dice score, Hausdorff distance and
surface distance.
• Analysis of annotation efficiency, robustness and failure cases.
• Thesis writing.