3rd International Conference on Vision Computing (VISCOM 2025)

Accepted Papers

Enhancing Bird Segmentation via Fine-tuned Monocular Depth Estimation: A Two-phase Multimodal Approach

Winner Bryan Kazaka, Khoury College of Computer Sciences, Northeastern University, USA

ABSTRACT

Monocular depth estimation provides geometric cues for computer vision tasks, but effectiveness depends on domain-specific fine-tuning. We investigate fine-tuned depth features’ impact on semantic segmentation using CUB-200-2011 birds. We fine-tune Depth Anything V2 on KITTI Odometry (328 samples), achieving 97.1% δ1 accuracy, then use depth and surface normals as U-Net input channels. Through two-phase training (RGB-only vs. RGB+depth+normals), fine-tuned depth features yield +3.81% mIoU, +5.17 % Dice, +1.18% accuracy improvements, validating that task-specific depth fine-tuning enhances downstream performance with limited data..

Keywords

Monocular Depth Estimation, Semantic Segmentation, Multimodal Learning, Transfer Learning, U-Net Architecture.

Welcome to VISCOM 2025