Learning From Detailed Maps: Joint 2D-3D Semantic Segmentation for Airborne Data with Selective Label Fusion
Keywords: Topographic Maps, Deep Learning, Multimodal Semantic Segmentation, 2D-3D Airborne Data, Label Fusion
Abstract. Objects for topographic maps are often extracted manually by interpreting and segmenting airborne data, such as 2D images and 3D point clouds. Deep learning (DL) with semantic segmentation can automate this process using existing maps as ground labels. However, current map-based DL methods are limited to either 2D or 3D, focus on urban regions, segment only a few generic classes, and overlook the effects of abstractions in map-derived labels. To overcome these limitations, we propose a segmentation method that uses maps as ground truth with (i) joint 2D and 3D networks using multi-scale feature learning to capture fine details and segment diverse objects and (ii) a Selective Label Fusion module to refine predictions across both modalities, addressing the effects of map abstractions. Trained and tested in urban, rural, and forested regions, our method segments 11 map-based classes in 2D and 12 classes in 3D. At the class level, we achieve a mean Intersection over Union (mIoU) of 70% for both 2D and 3D, with label fusion improving 3D performance by 15% over non-fused results. Regionally, 4 out of 5 areas achieve mIoU above 60% in both modalities. These results demonstrate the potential of maps and DL to automate the labeling of images and point clouds, helping to create and update maps while also generating valuable labeled datasets for other computer vision tasks.