Public defense of doctoral thesis in computer science - Robin Ghyselinck
FROM PIXELS TO PRACTICE: DEEP LEARNING IN MEDICAL COMPUTER-AIDED DETECTION – ENHANCING ENDOSCOPY WITH COMPUTER VISION
FROM PIXELS TO PRACTICE: DEEP LEARNING IN MEDICAL COMPUTER-AIDED DETECTION – ENHANCING ENDOSCOPY WITH COMPUTER VISION
Deep learning has revolutionized computer vision in recent years and has been applied to many fields. This thesis focuses on medical endoscopy, where deep learning can assist physicians in many tasks, such as navigating the lungs during bronchoscopy, assisting in the detection of lung diseases, detecting Crohn's disease from capsule endoscopy (PillCam), or automating the detection of polyps during colonoscopy procedures.
This thesis, entitled From Pixels to Practice: Deep Learning for Endoscopy, explores how modern neural networks and learning paradigms can improve visual understanding in endoscopy, with the aim of contributing to computer-aided detection (CAD) systems that can be integrated into clinical workflows.
This work follows an article-based structure and links methodological advances in geometric and temporal modeling to techniques for handling data scarcity and imbalance, as well as to the practical and clinical implications of deep learning for lung tumor detection, both from a clinical and practitioner perspective. The first part of the manuscript provides a common foundation for all subsequent parts. First, we present a general introduction to the field of machine learning in Chapter 1, explaining concepts such as classification, loss functions, and artificial neural networks. Next, Chapter 2 focuses on the field of deep learning for computer vision, detailing the main vision tasks, the concept of convolutional neural networks, ResNet, and U-Net. Finally, Chapter 3 describes medical imaging, with a focus on computed tomography (CT) scans and optical imaging. The second part of the thesis focuses on learning spatio-temporal representations. In Chapter 4, we use deep neural networks combining spatial features and temporal recurrence to address the problem of detecting the bronchial carina, an anatomical landmark that helps doctors navigate the lungs. By evaluating classification (ResNet-50), segmentation (nnU-Net), and recurrent (GRU) models on a bronchoscopy dataset we created, the study highlights the benefits of combining information from segmentation masks and temporal features. Chapter 5 continues the segmentation task by analyzing the extent to which rotation-equivariant U-Nets, based on E(2)-CNNs with C4, C8, and D4 symmetry groups, can improve performance when the orientation of objects in the image is arbitrary. Together, these chapters show how temporal and geometric modeling capture complementary aspects of visual structure. They further highlight that data imbalance and scarcity are recurring problems in deep learning. The third part studies learning in situations of data scarcity and imbalance. First, Chapter 6 explores supervised contrastive pre-training [1] on large, domain-close endoscopic datasets (Hyper-Kvasir [2], LDPolyp [3]), which is then transferred to smaller, disease-specific data (Crohn-IPI [4]). This methodology performs better than pre-training on ImageNet or based on cross-entropy, highlighting the value of domain-specific contrastive representations. Next, Chapter 7 introduces Mask-Aware Cropping (MAC), a new data augmentation technique that mitigates pixel-level imbalance in segmentation. On various datasets with varying imbalance regimes (URDE [5], Kvasir-SEG [6], HAM10000 [7]), MAC consistently improves Dice and IoU metrics under conditions of extreme imbalance. Together, these methods form a data-centric framework for effective learning when annotations are scarce or unevenly distributed. The fourth part of the thesis focuses on deep learning in the operating room. Chapter 8 proposes a first model (ResNet-50) for the visual detection of lung cancer in bronchoscopy, trained on real, in-vivo data. The model outperforms junior physicians, while remaining inferior to experts. This result shows that CAD systems for lung cancer detection are promising. Chapter 9 extends this work by evaluating the usability of a CAD system based on a deep learning model. Combining probability indices, temporal graphs, and saliency map overlays, a multicenter evaluation with 10 physicians is conducted. The tool received favorable feedback, with high usability (SUS score of 80.5 [8]) and strong clinical acceptance. Beyond endoscopy, the results concerning rotation equivariance and pixel imbalance can be generalized to other fields such as microscopy, dermatology, and aerial imaging. This shows that the proposed methods are applicable to visual learning under structured variability and limited data constraints.
Keywords: machine learning, computer vision, medicine, endoscopy, convolutional neural networks, segmentation, recurrent models, equivariance.